metal: pause step 3b pending sx-side fixes (filed 0024-0030)

Step 3b code is wired across UIRenderer + GlyphCache + UIPipeline +
chess game (gpu_mode = .metal on iOS, MetalGPU bound via the GPU
protocol). macOS GL chess, iOS-sim GLES chess, and iOS-sim Metal
triangle (63-metal-clear.sx) all still render.

iOS-sim Metal chess crashes inside replaceRegion uploading the 1MB
font atlas. Bisecting that crash exposed several sx-language issues
where mid-bisect tracers (NSLog inside if/else branch bodies) didn't
produce output, blocking further investigation.

Filing each finding as examples/issue-NNNN.sx rather than working
around piecemeal:

Bugs:
- 0024 NSLog/foreign-call inside if/else body not producing output
- 0025 C-ABI param coercion incomplete for composites >16B
       (combined direct-call abiCoerceParamType TODO + call_indirect
        path that doesn't apply C-ABI coercion at all)
- 0026 replaceRegion 1MB upload crash (likely downstream of 0025)

Features needed for step 4 + cleanup:
- 0027 Obj-C block bridge (^{...}) for animateWithDuration:
- 0028 Optional protocol box (?GPU = null) replaces T = ---; has_T: bool
- 0029 destroy_texture/buffer/shader on GPU protocol
- 0030 extern cross-file globals

Library-side: renderer.sx + glyph_cache.sx + pipeline.sx gain a
`gpu: GPU = ---; has_gpu: bool` field pair + branches that route every
GL touchpoint through the protocol when has_gpu. glyph_cache.init
saves/restores those fields around its memset. pipeline.set_gpu()
propagates to renderer + font. Renderer's MSL shader source added as
UI_MSL_SRC using packed_float2/packed_float4 to keep the 12-float
interleaved vertex layout tight (48 bytes).

metal.sx: dual-phase init (init(null, 0, 0) for eager device+queue,
re-init with the layer once UIKit installs the SxMetalView).
setStorageMode:.shared on every texture descriptor to ensure CPU-
writable atlas pixels on Apple Silicon iOS-sim.

Regression suite: 68 passing, 0 failed. WASM chess build currently
broken under step 3b state (silent compiler crash); documented in
CHECKPOINT.md, likely fallout from one of the filed issues (probably
0028 — the verbose protocol-box pattern). Step 3b resumes after
0024-0030 land.
This commit is contained in:
agra
2026-05-17 21:17:17 +03:00
parent a938c4f900
commit a1647eab9b
11 changed files with 783 additions and 97 deletions

90
examples/issue-0024.sx Normal file
View File

@@ -0,0 +1,90 @@
// issue-0024: NSLog/foreign-side-effect calls placed as the FIRST statement
// of an `if X { ... } else { ... }` branch body do not produce visible
// output, even when the branch is provably taken (the SECOND statement in
// the same body — also a foreign call — does produce output).
//
// ── Observed iOS-side symptom (session 59 bisect) ─────────────────────────
//
// In library/modules/gpu/metal.sx's `metal_create_texture_ios`:
//
// slot : TextureSlot = .{ tex = tex, bytes_per_pixel = bytes_per_pixel };
// self.textures.append(slot);
// NSLog(ns_string("[metal] T6 appended\n".ptr)); // ← fires
//
// pixels_null := pixels == null;
// if pixels_null {
// NSLog(ns_string("[metal] T6b null\n".ptr)); // ← never fires
// } else {
// NSLog(ns_string("[metal] T6a non-null\n".ptr)); // ← never fires
// handle : u32 = xx self.textures.len;
// metal_update_texture_region_ios(self, handle, 0, 0, w, h, pixels);
// // ← DOES fire
// // (its first
// // NSLog at
// // fn entry
// // appears in
// // the unified
// // log)
// NSLog(ns_string("[metal] T7 done\n".ptr)); // ← (helper crashed
// // before this)
// }
//
// T6 appears in the iOS unified log. T6a/T6b never appear. The else
// branch's helper call DOES fire (its own first-statement NSLog inside
// the helper appears). So the else-branch IS entered; just its first
// NSLog statement produces no output.
//
// ── Pure-sx repro below does NOT trigger ───────────────────────────────────
//
// Running `sx run examples/issue-0024.sx` exits 0 (counter == 4 — all
// bumps fired). The bug only manifests with foreign calls (NSLog / ns_string),
// and possibly only when the process subsequently crashes (replaceRegion
// in the metal.sx case) — which raises the alternative hypothesis that
// the missing NSLog output is just iOS unified-logging buffer-loss on
// process death, not a sx compiler bug. The runtime sequence between T6
// and the crash was ~500μs; logs within ~1ms of an unhandled exception
// can be lost to OSLog's internal buffering on Apple Silicon iOS-sim.
//
// ── Investigation plan ─────────────────────────────────────────────────────
//
// Two paths to disambiguate:
// 1. Replace NSLog markers with `write(STDERR_FILENO, ...)` calls
// (synchronous, no OSLog involvement). If markers still don't appear:
// sx compiler bug — likely in src/ir/lower.zig:2166-2196 (the
// `is_value` branch of `lowerIfExpr` and downstream `lowerBlockValue`
// around 922-948). Possible: side-effecting leading statements
// dropped when branches are treated as values.
// 2. If markers DO appear with synchronous write: the iOS-side symptom
// is unified-logging buffer-loss, not a compiler bug. Close this issue
// as "wontfix — diagnostic limitation" and move the iOS debugging to
// foreign-write tracing.
//
// ── Real-world impact ──────────────────────────────────────────────────────
//
// Bisecting issue-0026 (replaceRegion crash) is currently blocked: without
// trustworthy markers inside if/else branches we can't tell which arg
// arrives wrong. Resolution unblocks step 3b of the Metal port.
#import "modules/std.sx";
counter : s64 = 0;
bump :: () { counter = counter + 1; }
probe :: (skip: bool) {
bump();
if skip {
bump();
bump();
} else {
bump();
bump();
}
bump();
}
main :: () -> s32 {
probe(false);
// counter == 4 (entry + 2 in false branch + exit) → exit 0
if counter == 4 then 0 else 1;
}

94
examples/issue-0025.sx Normal file
View File

@@ -0,0 +1,94 @@
// issue-0025: Composite types larger than 16 bytes are passed without the
// LLVM `byval(<ty>)` attribute, and the `call_indirect` (fn-pointer cast)
// path doesn't apply C-ABI parameter coercion at all. Both gaps cause
// silent shape-mismatch when sx code calls foreign C functions that take
// large aggregates by value, OR when sx code calls a sx fn through a
// fn-pointer typed with a large-aggregate parameter.
//
// ── Two failing forms ─────────────────────────────────────────────────────
//
// (A) Direct call to a sx function with a >16B param:
//
// Wide :: struct { a: s64; b: s64; c: s64; d: s64; } // 32 bytes
// accept :: (w: Wide) -> s64 { w.a + w.b + w.c + w.d; }
// accept(Wide.{ a = 1, b = 10, c = 100, d = 1000 }) // expect 1111
//
// src/ir/emit_llvm.zig:2747-2795 (`abiCoerceParamType`):
// - <=8 bytes → coerced to i64
// - 9-16 bytes → coerced to [2 x i64]
// - >16 bytes → returns llvm_ty unchanged with TODO at line 2793
//
// The TODO is the bug: large composites should be coerced to `ptr`
// with a `byval(struct.T)` LLVM attribute. LLVM's mid-end then
// materializes the right machine code per target. Today the struct
// is left as-is, which LLVM tries to pass across registers + stack
// slots in ways that don't match the C ABI promise.
//
// (B) Indirect call via fn-pointer cast (the `xx objc_msgSend` idiom):
//
// fn_ptr : (Wide) -> s64 = xx accept;
// fn_ptr(Wide.{ ... })
//
// src/ir/emit_llvm.zig:902-967 (`.call_indirect`): both the
// FunctionInfo-known arm (939-952) and the LLVMTypeOf-fallback arm
// (953-956) construct `param_tys[j]` WITHOUT routing through
// `abiCoerceParamType`. So even if (A) is fixed, fn-pointer-cast call
// sites still mis-marshal large composites.
//
// ── Real-world impact ──────────────────────────────────────────────────────
//
// Every `xx objc_msgSend` call site in library/modules/platform/uikit.sx
// + library/modules/gpu/metal.sx. Works in practice today only because:
// - We never pass aggregates >16 bytes by value through fn-pointer casts
// (workaround: declare param as `*T` + pass `@local`; arm64 AAPCS's
// indirect-by-ref happens to match this machine-state-wise).
// - HFAs (CGSize 2×f64, MTLClearColor 4×f64, CGRect 4×f64 as return)
// are correctly classified at emit_llvm.zig:2766-2779.
//
// ── Workarounds in use ─────────────────────────────────────────────────────
//
// library/modules/gpu/metal.sx declares MTLRegion (48B) + MTLScissorRect
// (32B) call sites with `*MTLRegion` / `*MTLScissorRect` and passes
// `@region` / `@rect`. Should not be needed once this issue is fixed.
//
// ── Fix sketch ─────────────────────────────────────────────────────────────
//
// (A) emit_llvm.zig:2793 — return `ptr` and emit `byval(struct.T)` on
// the param via `LLVMAddCallSiteAttribute` / `LLVMCreateTypeAttribute`.
// At call sites, alloca + memcpy + pass the alloca pointer. Apply
// identically at function-definition emission so direct calls roundtrip.
//
// (B) emit_llvm.zig:902-967 — factor out a helper
// `coerceCallParams(param_count, src_args, dst_fn_param_tys)
// -> (coerced_args, coerced_tys)` that wraps `abiCoerceParamType`.
// Use the helper from both arms.
//
// Edge cases to preserve:
// - Variadic foreign functions (printf family) — variadic tail per
// AAPCS64 still passes composites in their natural form. Keep
// existing behavior for variadic args.
// - HFAs already handled at 2766-2779 — don't touch.
// - Structs <=8 bytes coerced to `i64`, 9-16 bytes to `[2 x i64]` —
// don't touch.
#import "modules/std.sx";
Wide :: struct {
a: s64; b: s64; c: s64; d: s64;
}
accept :: (w: Wide) -> s64 {
w.a + w.b + w.c + w.d;
}
main :: () -> s32 {
w := Wide.{ a = 1, b = 10, c = 100, d = 1000 };
direct := accept(w); // exercises path (A)
if direct != 1111 { return 1; }
fn_ptr : (Wide) -> s64 = xx accept;
indirect := fn_ptr(w); // exercises path (B)
if indirect != 1111 { return 2; }
0;
}

68
examples/issue-0026.sx Normal file
View File

@@ -0,0 +1,68 @@
// issue-0026: Chess game on iOS-sim with `plat.gpu_mode = .metal` crashes
// inside `[MTLTexture replaceRegion:mipmapLevel:withBytes:bytesPerRow:]`
// when uploading the 1024×1024 R8 font atlas. The 1×1 RGBA8 white tex
// through the SAME code path (metal_update_texture_region_ios in
// library/modules/gpu/metal.sx) works.
//
// Blocked on issue-0024 (NSLog inside if/else not firing — or unified-log
// buffer loss on crash; investigation pending) — without a trustworthy
// tracer we can't reliably bisect which arg arrives wrong. Most likely
// cause: this is downstream of issue-0025's ABI gaps (MTLRegion is 48
// bytes and goes through `xx objc_msgSend` cast, which is the
// call_indirect path that issue-0025 part B covers).
//
// ── Reproduction recipe ───────────────────────────────────────────────────
//
// cd /Users/agra/projects/game
// /Users/agra/projects/sx/zig-out/bin/sx build --target ios-sim main.sx \
// --bundle sx-out/ios/SxChess.app --bundle-id co.swipelab.sxchess \
// -F ~/Library/Frameworks
// cp -R assets sx-out/ios/SxChess.app/
// codesign --force --sign - --timestamp=none sx-out/ios/SxChess.app
// xcrun simctl install booted sx-out/ios/SxChess.app
// xcrun simctl launch --terminate-running-process booted co.swipelab.sxchess
// sleep 4 && xcrun simctl io booted screenshot /tmp/sx-chess.png
//
// Expected (after fix): chess board renders via Metal.
// Observed: app launches, returns immediately to home screen, no screen
// touched. The simpler examples/63-metal-clear.sx demo still renders the
// colored triangle on the same sim, so the Metal pipeline itself works
// for small uploads.
//
// ── Candidate root causes (in priority order) ─────────────────────────────
//
// 1. issue-0025 fallout (most likely): MTLRegion (48 B by value) passed
// via the *MTLRegion workaround. The call_indirect path (issue-0025
// part B) doesn't ABI-coerce, so the pointer-shaped declaration may
// not actually pass the address in the right register slot for that
// call site shape (6 args, including the indirect aggregate).
//
// 2. iOS-sim Metal-driver limitation: `setStorageMode:.shared` may not be
// honored for r8 textures of this size; default may be `.private`
// which precludes CPU-side replaceRegion. Workaround would be to
// upload via `MTLBuffer` + `MTLBlitCommandEncoder` (newBufferWithBytes
// + copyFromBuffer:sourceOffset:sourceBytesPerRow:...:toTexture:...).
//
// 3. sx-side `xx` cast bug: bytes_per_row : u64 = xx (u32_expr) may
// truncate or sign-extend incorrectly. Less likely (the math comes
// out to 1024, which fits in any width).
//
// ── How to resolve ────────────────────────────────────────────────────────
//
// After issues 0024 + 0025 are landed:
// 1. Re-add the trace NSLog markers ("[metal] U1..U5" in
// metal_update_texture_region_ios) — now they should actually print.
// 2. Re-build + relaunch chess on iOS-sim.
// 3. If U5 fires after U4 (no crash inside msg_replace), the bug was
// ABI-related; declare success and rename this file to
// examples/NN-metal-large-region-upload.sx (next free NN).
// 4. If U4 → crash persists, fall back to the MTLBuffer + blit
// encoder path in metal.sx's create_texture (when pixels != null,
// allocate a temporary MTLBuffer with newBufferWithBytes:length:options:
// then run a one-shot command buffer with a MTLBlitCommandEncoder
// copying the buffer into the texture). This is the Apple-recommended
// approach for large texture initial-uploads.
#import "modules/std.sx";
main :: () -> s32 { 0; }

50
examples/issue-0027.sx Normal file
View File

@@ -0,0 +1,50 @@
// issue-0027: Feature — support Obj-C blocks (^{...}) so sx code can call
// APIs that take a block parameter. Required for step 4 of the Metal port
// (keyboard lockstep via `[UIView animateWithDuration:animations:^{...}]`),
// and broadly useful for any UIKit/AppKit API.
//
// ── Proposed surface ──────────────────────────────────────────────────────
//
// Option A — comptime intrinsic that wraps a sx closure as a block:
//
// block := objc_block(@my_closure); // returns *void (an id<Block>)
// msg_block(view, sel, 0.3, block); // pass like any id arg
//
// Internals: emit a Block_literal struct constant with the right invoke
// fn pointer, isa, flags, descriptor pointer. Approximately what clang
// generates for ^{...}.
//
// Option B — surface-level syntax `^{ ... }` that lowers to Option A
// automatically. Cleaner for users; more parser work.
//
// Recommended: start with Option A (intrinsic). Migrate to Option B once
// the codegen path is proven.
//
// ── Implementation sketch ────────────────────────────────────────────────
//
// 1. New `library/modules/std/objc_block.sx` defining the Block_literal
// struct that mirrors clang's layout (isa, flags, reserved, invoke fn
// pointer, descriptor pointer).
// 2. `objc_block(fn_or_closure) -> *void` intrinsic that builds the
// literal at the call site. Initial implementation can be a
// stack-allocated block (_NSConcreteStackBlock); upgrade to
// heap-promoted (_Block_copy) once block lifetime exceeds the call.
// 3. Link libSystem's symbols `_NSConcreteStackBlock` and
// `_NSConcreteGlobalBlock` (auto on iOS; may need `#library "System"`
// on macOS).
// 4. (Deferred) surface syntax `^{ ... }` — parser hook + lowering
// to the intrinsic. Must not clash with bitwise XOR `^`.
//
// ── References ────────────────────────────────────────────────────────────
//
// - Apple block ABI spec (clang's "Block Implementation Specification")
// - _NSConcreteStackBlock + _NSConcreteGlobalBlock from libSystem
//
// ── Real-world impact ─────────────────────────────────────────────────────
//
// Without this, the keyboard inset cannot be animated in lockstep with the
// keyboard slide. See library/modules/platform/uikit.sx's
// uikit_keyboard_will_change_frame comments for the deferred lockstep work.
#import "modules/std.sx";
main :: () -> s32 { 0; }

53
examples/issue-0028.sx Normal file
View File

@@ -0,0 +1,53 @@
// issue-0028: Feature — make protocol boxes assignable to an optional
// type so callers can spell "no GPU bound" as `?GPU = null` instead of
// the verbose `T = ---; has_T: bool` pattern.
//
// ── Current pattern (verbose) ─────────────────────────────────────────────
//
// gpu: GPU = ---;
// has_gpu: bool = false;
// ...
// if self.has_gpu { self.gpu.create_shader(...); }
//
// ── Proposed pattern ──────────────────────────────────────────────────────
//
// gpu: ?GPU = null;
// ...
// if self.gpu != null { self.gpu.create_shader(...); }
//
// ── Where the verbose pattern lives today ─────────────────────────────────
//
// library/modules/ui/renderer.sx — UIRenderer.gpu + has_gpu
// library/modules/ui/glyph_cache.sx — GlyphCache.gpu + has_gpu
// library/modules/ui/pipeline.sx — UIPipeline.gpu + has_gpu (+ set_gpu)
// library/modules/platform/uikit.sx — UIKitPlatform.frame_closure +
// has_frame_closure (Closure type,
// same pattern but on a closure)
//
// ── Implementation sketch ─────────────────────────────────────────────────
//
// Protocol boxes are 2-pointer structs ({vtable, ctx} or {ctx, fn_ptrs...}
// depending on the inline-vs-vtable shape — see src/ir/lower.zig
// `buildProtocolValue` ~7800-7869). `?T` for these can use `vtable_ptr ==
// null` (or `ctx == null`, depending on layout choice) as the "none"
// sentinel — no extra storage needed. This matches the existing
// optional-closure handling at src/ir/emit_llvm.zig where `?Closure` uses
// `fn_ptr == null` as none.
//
// Approach:
// 1. Extend `?T` type construction to accept T being a protocol type.
// Files: src/ir/types.zig + src/ir/lower.zig (type-resolution).
// 2. Implement `optional_wrap` / `optional_unwrap` /
// `optional_has_value` for protocol-typed payloads in
// src/ir/emit_llvm.zig — model after the closure-optional path.
// 3. Keep the existing `T = ---; has_T: bool` pattern working — the
// new `?T` is additive, not a replacement. Don't churn existing
// files (uikit.sx's frame_closure pattern stays).
//
// ── Syntax constraint ─────────────────────────────────────────────────────
//
// `?T` syntax already exists for primitives + pointers. Extending to
// protocols is a type-system change; no new surface syntax needed.
#import "modules/std.sx";
main :: () -> s32 { 0; }

47
examples/issue-0029.sx Normal file
View File

@@ -0,0 +1,47 @@
// issue-0029: Feature — add explicit destructors to the GPU protocol so
// resources can be freed without leaking.
//
// ── Proposed additions to library/modules/gpu/api.sx ──────────────────────
//
// destroy_shader :: (h: ShaderHandle);
// destroy_buffer :: (h: BufferHandle);
// destroy_texture :: (h: TextureHandle);
//
// ── Why ────────────────────────────────────────────────────────────────────
//
// Today, library/modules/ui/glyph_cache.sx's `grow()` method recreates
// the atlas texture at a larger size but has no way to release the old
// one — see the comment in metal.sx that explicitly notes the leak. The
// GL path uses glDeleteTextures(1, @self.texture_id); the GPU protocol
// has no equivalent yet.
//
// ── Implementation notes ──────────────────────────────────────────────────
//
// Metal backend: send `release` to the MTLTexture / MTLBuffer /
// MTLRenderPipelineState (or call CFRelease, since these are
// CFTypeRef-compatible). Clear the corresponding slot in
// MetalGPU.textures / buffers / shaders to `null` / 0.
//
// GL backend (future): glDeleteTextures / glDeleteBuffers / glDeleteProgram.
//
// Handle lifecycle: after destroy, the slot in the backend List is freed.
// New allocations can take that slot or grow the list. Caller's handles
// remain valid until destroy. Don't aggressively re-use slots in MVP;
// keep handles append-only with a `null` marker for destroyed entries
// (matches the current shape).
//
// ── Touch points ──────────────────────────────────────────────────────────
//
// library/modules/gpu/api.sx — add 3 protocol method signatures
// library/modules/gpu/metal.sx — implement them (release + null
// the slot)
// library/modules/ui/glyph_cache.sx — call destroy_texture(old_handle)
// in grow() before creating the
// new atlas
//
// ── Syntax constraint ─────────────────────────────────────────────────────
//
// None — straight protocol-method addition.
#import "modules/std.sx";
main :: () -> s32 { 0; }

57
examples/issue-0030.sx Normal file
View File

@@ -0,0 +1,57 @@
// issue-0030: Feature — support `extern` global declarations so a global
// declared in one sx source file can be referenced from another without
// parameter threading.
//
// ── Use case from the Metal port ──────────────────────────────────────────
//
// // game/main.sx
// g_metal_gpu : *MetalGPU = null;
//
// // game/chess/pieces.sx
// extern g_metal_gpu : *MetalGPU;
//
// load :: (self: *ChessPieces, path: [:0]u8) {
// ...
// inline if OS == .ios {
// tex := g_metal_gpu.create_texture(w, h, .rgba8, xx pixels);
// } else {
// // GL path
// }
// }
//
// Today, pieces.load takes `has_gpu: bool, gpu: GPU` parameters and
// game/main.sx threads them through. Cross-file `extern` globals would
// let us drop those parameters.
//
// ── Implementation sketch ─────────────────────────────────────────────────
//
// Mirror how foreign function declarations work — declared in one file,
// defined elsewhere, linker resolves. Globals already have first-class
// addresses in the IR; just add an "extern" flag that says "don't emit
// storage, emit a reference."
//
// Files:
// - parser (sx surface syntax for `extern G : T;`)
// - src/ir/lower.zig (record an extern global stub that resolves at
// module-link time)
// - src/ir/emit_llvm.zig (emit an `external` LLVM global)
//
// ── Syntax constraint ─────────────────────────────────────────────────────
//
// `extern G : T;` is a NEW top-level form. Must not clash with:
// - `G :: T;` (type alias)
// - `G : T = ---;` (uninitialized global with explicit type)
// - `G : T;` (does this currently parse as anything?)
//
// The parser MUST reject `extern G : T = expr;` — extern cannot have an
// initializer (the definition lives elsewhere).
//
// ── Caveat ────────────────────────────────────────────────────────────────
//
// Encourages spaghetti globals. Documentation should steer callers toward
// explicit parameter passing where reasonable. Useful for genuine
// process-singletons (the active GPU, the active platform, etc.) where
// threading them through every call site is more noise than signal.
#import "modules/std.sx";
main :: () -> s32 { 0; }

View File

@@ -28,6 +28,12 @@ MTL_PIXEL_FORMAT_R8_UNORM :u64: 10;
MTL_LOAD_ACTION_CLEAR :u64: 2; MTL_LOAD_ACTION_CLEAR :u64: 2;
MTL_STORE_ACTION_STORE :u64: 1; MTL_STORE_ACTION_STORE :u64: 1;
// MTLStorageMode. For UI atlases + sprites the CPU needs to write pixels
// and the GPU needs to sample — `.shared` is the safe default. On iOS-sim
// under Apple Silicon the convenience class method's default storage
// isn't reliably shared, so we set it explicitly in metal_create_texture_ios.
MTL_STORAGE_MODE_SHARED :u64: 0;
// MTLPrimitiveType. // MTLPrimitiveType.
MTL_PRIMITIVE_TYPE_TRIANGLE :u64: 3; MTL_PRIMITIVE_TYPE_TRIANGLE :u64: 3;
@@ -84,11 +90,18 @@ MetalGPU :: struct {
} }
impl GPU for MetalGPU { impl GPU for MetalGPU {
// Two-phase init: callers can `init(null, 0, 0)` first to allocate
// device + queue eagerly (lets the UI pipeline compile shaders before
// UIKit hands us a layer), then re-call `init(layer, w, h)` once the
// CAMetalLayer is available. The second call only updates the layer
// ref + dims; device/queue are preserved.
init :: (self: *MetalGPU, target: *void, pixel_w: s32, pixel_h: s32) -> bool { init :: (self: *MetalGPU, target: *void, pixel_w: s32, pixel_h: s32) -> bool {
inline if OS != .ios { return false; } inline if OS != .ios { return false; }
self.layer = target; if target != null {
self.pixel_w = pixel_w; self.layer = target;
self.pixel_h = pixel_h; self.pixel_w = pixel_w;
self.pixel_h = pixel_h;
}
metal_init_ios(self); metal_init_ios(self);
} }
@@ -200,12 +213,19 @@ impl GPU for MetalGPU {
// so non-iOS builds never reference the unresolved Metal symbols below. // so non-iOS builds never reference the unresolved Metal symbols below.
// ─────────────────────────────────────────────────────────────────────────── // ───────────────────────────────────────────────────────────────────────────
// init() may be called twice: once with target==null to create device +
// queue eagerly (so the UI pipeline can compile shaders before UIKit
// has a layer for us), then again with target=CAMetalLayer once
// `-[SxAppDelegate didFinishLaunching:]` has installed the view.
// Both calls go through this helper; it's idempotent on the device/queue
// and only touches the layer when one's been supplied.
metal_init_ios :: (self: *MetalGPU) -> bool { metal_init_ios :: (self: *MetalGPU) -> bool {
inline if OS != .ios { return false; } inline if OS != .ios { return false; }
if self.layer == null { return false; }
self.device = MTLCreateSystemDefaultDevice(); if self.device == null {
if self.device == null { return false; } self.device = MTLCreateSystemDefaultDevice();
if self.device == null { return false; }
}
msg_oo : (*void, *void, *void) -> void = xx objc_msgSend; msg_oo : (*void, *void, *void) -> void = xx objc_msgSend;
msg_ou : (*void, *void, u64) -> void = xx objc_msgSend; msg_ou : (*void, *void, u64) -> void = xx objc_msgSend;
@@ -213,15 +233,19 @@ metal_init_ios :: (self: *MetalGPU) -> bool {
msg_osize : (*void, *void, CGSize) -> void = xx objc_msgSend; msg_osize : (*void, *void, CGSize) -> void = xx objc_msgSend;
msg_o : (*void, *void) -> *void = xx objc_msgSend; msg_o : (*void, *void) -> *void = xx objc_msgSend;
msg_oo(self.layer, sel_registerName("setDevice:".ptr), self.device); if self.queue == null {
msg_ou(self.layer, sel_registerName("setPixelFormat:".ptr), MTL_PIXEL_FORMAT_BGRA8_UNORM); self.queue = msg_o(self.device, sel_registerName("newCommandQueue".ptr));
msg_ob(self.layer, sel_registerName("setFramebufferOnly:".ptr), 1); if self.queue == null { return false; }
}
size := CGSize.{ width = xx self.pixel_w, height = xx self.pixel_h }; if self.layer != null {
msg_osize(self.layer, sel_registerName("setDrawableSize:".ptr), size); msg_oo(self.layer, sel_registerName("setDevice:".ptr), self.device);
msg_ou(self.layer, sel_registerName("setPixelFormat:".ptr), MTL_PIXEL_FORMAT_BGRA8_UNORM);
msg_ob(self.layer, sel_registerName("setFramebufferOnly:".ptr), 1);
self.queue = msg_o(self.device, sel_registerName("newCommandQueue".ptr)); size := CGSize.{ width = xx self.pixel_w, height = xx self.pixel_h };
if self.queue == null { return false; } msg_osize(self.layer, sel_registerName("setDrawableSize:".ptr), size);
}
true; true;
} }
@@ -457,6 +481,12 @@ metal_create_texture_ios :: (self: *MetalGPU, w: s32, h: s32, format: TextureFor
pixel_format, xx w, xx h, 0); pixel_format, xx w, xx h, 0);
if desc == null { return 0; } if desc == null { return 0; }
// Force shared storage so the CPU can keep writing pixels (atlas updates,
// sprite uploads). On iOS-sim under Apple Silicon the convenience class
// method's default storage isn't reliably shared for every format.
msg_ou_void : (*void, *void, u64) -> void = xx objc_msgSend;
msg_ou_void(desc, sel_registerName("setStorageMode:".ptr), MTL_STORAGE_MODE_SHARED);
msg_oo : (*void, *void, *void) -> *void = xx objc_msgSend; msg_oo : (*void, *void, *void) -> *void = xx objc_msgSend;
tex := msg_oo(self.device, sel_registerName("newTextureWithDescriptor:".ptr), desc); tex := msg_oo(self.device, sel_registerName("newTextureWithDescriptor:".ptr), desc);
if tex == null { return 0; } if tex == null { return 0; }

View File

@@ -1,5 +1,7 @@
#import "modules/std.sx"; #import "modules/std.sx";
#import "modules/opengl.sx"; #import "modules/opengl.sx";
#import "modules/gpu/types.sx";
#import "modules/gpu/api.sx";
#import "modules/stb_truetype.sx"; #import "modules/stb_truetype.sx";
#import "modules/ui/types.sx"; #import "modules/ui/types.sx";
@@ -176,9 +178,20 @@ GlyphCache :: struct {
last_shape_len: s64; last_shape_len: s64;
last_shape_size_q: u16; last_shape_size_q: u16;
// GPU protocol backend. When `has_gpu`, atlas creation + dirty uploads
// route through `gpu` instead of raw GL.
gpu: GPU = ---;
has_gpu: bool = false;
init :: (self: *GlyphCache, path: [:0]u8, default_size: f32) { init :: (self: *GlyphCache, path: [:0]u8, default_size: f32) {
// Preserve any pre-set GPU dispatch across the zero-out — the
// surrounding struct memset would otherwise wipe it.
saved_gpu := self.gpu;
saved_has_gpu := self.has_gpu;
// Zero out the entire struct first (parent may be uninitialized with = ---) // Zero out the entire struct first (parent may be uninitialized with = ---)
memset(self, 0, size_of(GlyphCache)); memset(self, 0, size_of(GlyphCache));
self.gpu = saved_gpu;
self.has_gpu = saved_has_gpu;
// Load font file // Load font file
file_size : s32 = 0; file_size : s32 = 0;
@@ -245,15 +258,25 @@ GlyphCache :: struct {
val_bytes : s64 = self.hash_cap * 8; // s64 per slot (s32 would suffice but alignment) val_bytes : s64 = self.hash_cap * 8; // s64 per slot (s32 would suffice but alignment)
self.hash_vals = xx context.allocator.alloc(val_bytes); self.hash_vals = xx context.allocator.alloc(val_bytes);
// Create OpenGL texture // Create the atlas texture. In GPU-protocol mode we create empty and
glGenTextures(1, @self.texture_id); // let the first `flush()` push the (zero-initialized) bitmap via
glBindTexture(GL_TEXTURE_2D, self.texture_id); // update_texture_region — same result as the GL path's glTexImage2D
glPixelStorei(GL_UNPACK_ALIGNMENT, 1); // with the zeroed bitmap, but works whether or not the backend
glTexImage2D(GL_TEXTURE_2D, 0, xx GL_R8, self.atlas_width, self.atlas_height, 0, GL_RED, GL_UNSIGNED_BYTE, self.bitmap); // accepts CPU pixel pointers at create time.
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, xx GL_LINEAR); if self.has_gpu {
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, xx GL_LINEAR); self.texture_id = self.gpu.create_texture(
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, xx GL_CLAMP_TO_EDGE); self.atlas_width, self.atlas_height, .r8, null);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, xx GL_CLAMP_TO_EDGE); self.dirty = true;
} else {
glGenTextures(1, @self.texture_id);
glBindTexture(GL_TEXTURE_2D, self.texture_id);
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glTexImage2D(GL_TEXTURE_2D, 0, xx GL_R8, self.atlas_width, self.atlas_height, 0, GL_RED, GL_UNSIGNED_BYTE, self.bitmap);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, xx GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, xx GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, xx GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, xx GL_CLAMP_TO_EDGE);
}
out("GlyphCache initialized: "); out("GlyphCache initialized: ");
out(path); out(path);
@@ -406,9 +429,14 @@ GlyphCache :: struct {
// Upload dirty atlas to GPU // Upload dirty atlas to GPU
flush :: (self: *GlyphCache) { flush :: (self: *GlyphCache) {
if self.dirty == false { return; } if self.dirty == false { return; }
glBindTexture(GL_TEXTURE_2D, self.texture_id); if self.has_gpu {
glPixelStorei(GL_UNPACK_ALIGNMENT, 1); self.gpu.update_texture_region(self.texture_id, 0, 0,
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, self.atlas_width, self.atlas_height, GL_RED, GL_UNSIGNED_BYTE, self.bitmap); self.atlas_width, self.atlas_height, xx self.bitmap);
} else {
glBindTexture(GL_TEXTURE_2D, self.texture_id);
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, self.atlas_width, self.atlas_height, GL_RED, GL_UNSIGNED_BYTE, self.bitmap);
}
self.dirty = false; self.dirty = false;
} }
@@ -464,16 +492,23 @@ GlyphCache :: struct {
self.atlas_width = new_w; self.atlas_width = new_w;
self.atlas_height = new_h; self.atlas_height = new_h;
// Recreate GL texture // Recreate atlas at the new size.
glDeleteTextures(1, @self.texture_id); if self.has_gpu {
glGenTextures(1, @self.texture_id); // No destroy_texture in the GPU protocol yet — old atlas
glBindTexture(GL_TEXTURE_2D, self.texture_id); // leaks in the backend table until process exit. Atlas grow
glPixelStorei(GL_UNPACK_ALIGNMENT, 1); // is rare so this is acceptable for now.
glTexImage2D(GL_TEXTURE_2D, 0, xx GL_R8, new_w, new_h, 0, GL_RED, GL_UNSIGNED_BYTE, new_bitmap); self.texture_id = self.gpu.create_texture(new_w, new_h, .r8, xx new_bitmap);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, xx GL_LINEAR); } else {
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, xx GL_LINEAR); glDeleteTextures(1, @self.texture_id);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, xx GL_CLAMP_TO_EDGE); glGenTextures(1, @self.texture_id);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, xx GL_CLAMP_TO_EDGE); glBindTexture(GL_TEXTURE_2D, self.texture_id);
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glTexImage2D(GL_TEXTURE_2D, 0, xx GL_R8, new_w, new_h, 0, GL_RED, GL_UNSIGNED_BYTE, new_bitmap);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, xx GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, xx GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, xx GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, xx GL_CLAMP_TO_EDGE);
}
// Recompute UV coordinates for all cached glyphs // Recompute UV coordinates for all cached glyphs
atlas_wf : f32 = xx new_w; atlas_wf : f32 = xx new_w;

View File

@@ -1,6 +1,7 @@
#import "modules/std.sx"; #import "modules/std.sx";
#import "modules/allocators.sx"; #import "modules/allocators.sx";
#import "modules/opengl.sx"; #import "modules/opengl.sx";
#import "modules/gpu/api.sx";
#import "modules/ui/types.sx"; #import "modules/ui/types.sx";
#import "modules/ui/render.sx"; #import "modules/ui/render.sx";
#import "modules/ui/events.sx"; #import "modules/ui/events.sx";
@@ -24,6 +25,23 @@ UIPipeline :: struct {
has_body: bool; has_body: bool;
parent_allocator: Allocator; parent_allocator: Allocator;
// GPU protocol backend. When `has_gpu`, the pipeline propagates this
// to its renderer + font, and skips the per-frame GL state setup in
// commit_gpu (Metal bakes blend mode into the pipeline state).
gpu: GPU = ---;
has_gpu: bool = false;
// Set the GPU dispatch BEFORE calling init() / init_font() so the
// shaders + atlas land on the right backend.
set_gpu :: (self: *UIPipeline, gpu: GPU) {
self.gpu = gpu;
self.has_gpu = true;
self.renderer.gpu = gpu;
self.renderer.has_gpu = true;
self.font.gpu = gpu;
self.font.has_gpu = true;
}
init :: (self: *UIPipeline, width: f32, height: f32) { init :: (self: *UIPipeline, width: f32, height: f32) {
self.render_tree = RenderTree.init(); self.render_tree = RenderTree.init();
self.renderer.init(); self.renderer.init();
@@ -149,14 +167,18 @@ UIPipeline :: struct {
} }
commit_gpu :: (self: *UIPipeline) { commit_gpu :: (self: *UIPipeline) {
glEnable(GL_BLEND); if !self.has_gpu {
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); glEnable(GL_BLEND);
glDisable(GL_DEPTH_TEST); glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
glDisable(GL_DEPTH_TEST);
}
self.renderer.begin(self.screen_width, self.screen_height, self.font.texture_id); self.renderer.begin(self.screen_width, self.screen_height, self.font.texture_id);
self.renderer.process(@self.render_tree); self.renderer.process(@self.render_tree);
self.renderer.flush(); self.renderer.flush();
glDisable(GL_BLEND); if !self.has_gpu {
glDisable(GL_BLEND);
}
} }
} }

View File

@@ -2,6 +2,8 @@
#import "modules/compiler.sx"; #import "modules/compiler.sx";
#import "modules/opengl.sx"; #import "modules/opengl.sx";
#import "modules/math"; #import "modules/math";
#import "modules/gpu/types.sx";
#import "modules/gpu/api.sx";
#import "modules/ui/types.sx"; #import "modules/ui/types.sx";
#import "modules/ui/render.sx"; #import "modules/ui/render.sx";
#import "modules/ui/glyph_cache.sx"; #import "modules/ui/glyph_cache.sx";
@@ -13,62 +15,81 @@ UI_VERTEX_BYTES :s64: 48;
MAX_UI_VERTICES :s64: 16384; MAX_UI_VERTICES :s64: 16384;
UIRenderer :: struct { UIRenderer :: struct {
// GL-side handles. Used when `gpu == null` (every non-iOS target today).
vao: u32; vao: u32;
vbo: u32; vbo: u32;
shader: u32; shader: u32;
proj_loc: s32; proj_loc: s32;
tex_loc: s32; tex_loc: s32;
// CPU-side vertex scratch buffer — same for both backends.
vertices: [*]f32; vertices: [*]f32;
vertex_count: s64; vertex_count: s64;
screen_width: f32; screen_width: f32;
screen_height: f32; screen_height: f32;
dpi_scale: f32; dpi_scale: f32;
white_texture: u32; white_texture: u32; // GL name OR TextureHandle (both are u32-shaped)
current_texture: u32; current_texture: u32;
draw_calls: s64; draw_calls: s64;
init :: (self: *UIRenderer) { // GPU protocol backend. When `has_gpu`, the renderer routes shader /
// Create shader (ES for WASM/WebGL2 + iOS GLES3, Core for desktop GL 3.3) // buffer / texture / draw calls through `gpu` instead of raw GL. The
inline if OS == .wasm or OS == .ios { // chess game sets this on iOS to a boxed `*MetalGPU`.
self.shader = create_program(UI_VERT_SRC_ES, UI_FRAG_SRC_ES); gpu: GPU = ---;
} else { has_gpu: bool = false;
self.shader = create_program(UI_VERT_SRC_CORE, UI_FRAG_SRC_CORE); mtl_shader: ShaderHandle = 0;
} mtl_vbuf: BufferHandle = 0;
self.proj_loc = glGetUniformLocation(self.shader, "uProj");
self.tex_loc = glGetUniformLocation(self.shader, "uTex");
// Allocate vertex buffer (CPU side) init :: (self: *UIRenderer) {
// Allocate vertex scratch (CPU side) — same for both backends.
buf_size := MAX_UI_VERTICES * UI_VERTEX_BYTES; buf_size := MAX_UI_VERTICES * UI_VERTEX_BYTES;
self.vertices = xx context.allocator.alloc(buf_size); self.vertices = xx context.allocator.alloc(buf_size);
memset(self.vertices, 0, buf_size); memset(self.vertices, 0, buf_size);
self.vertex_count = 0; self.vertex_count = 0;
// Create VAO/VBO
glGenVertexArrays(1, @self.vao);
glGenBuffers(1, @self.vbo);
glBindVertexArray(self.vao);
glBindBuffer(GL_ARRAY_BUFFER, self.vbo);
glBufferData(GL_ARRAY_BUFFER, xx buf_size, null, GL_DYNAMIC_DRAW);
// pos (2 floats)
glVertexAttribPointer(0, 2, GL_FLOAT, 0, xx UI_VERTEX_BYTES, xx 0);
glEnableVertexAttribArray(0);
// uv (2 floats)
glVertexAttribPointer(1, 2, GL_FLOAT, 0, xx UI_VERTEX_BYTES, xx 8);
glEnableVertexAttribArray(1);
// color (4 floats)
glVertexAttribPointer(2, 4, GL_FLOAT, 0, xx UI_VERTEX_BYTES, xx 16);
glEnableVertexAttribArray(2);
// params: corner_radius, border_width, rect_w, rect_h
glVertexAttribPointer(3, 4, GL_FLOAT, 0, xx UI_VERTEX_BYTES, xx 32);
glEnableVertexAttribArray(3);
glBindVertexArray(0);
self.dpi_scale = 1.0; self.dpi_scale = 1.0;
// 1x1 white texture for solid rects if self.has_gpu {
self.white_texture = create_white_texture(); // ── Metal backend (via GPU protocol) ───────────────────────
self.mtl_shader = self.gpu.create_shader(UI_MSL_SRC, "");
self.mtl_vbuf = self.gpu.create_buffer(buf_size);
white_px : [4]u8 = .[255, 255, 255, 255];
self.white_texture = self.gpu.create_texture(1, 1, .rgba8, xx @white_px[0]);
} else {
// ── GL backend ─────────────────────────────────────────────
// Create shader (ES for WASM/WebGL2 + iOS GLES3, Core for desktop GL 3.3)
inline if OS == .wasm or OS == .ios {
self.shader = create_program(UI_VERT_SRC_ES, UI_FRAG_SRC_ES);
} else {
self.shader = create_program(UI_VERT_SRC_CORE, UI_FRAG_SRC_CORE);
}
self.proj_loc = glGetUniformLocation(self.shader, "uProj");
self.tex_loc = glGetUniformLocation(self.shader, "uTex");
// Create VAO/VBO
glGenVertexArrays(1, @self.vao);
glGenBuffers(1, @self.vbo);
glBindVertexArray(self.vao);
glBindBuffer(GL_ARRAY_BUFFER, self.vbo);
glBufferData(GL_ARRAY_BUFFER, xx buf_size, null, GL_DYNAMIC_DRAW);
// pos (2 floats)
glVertexAttribPointer(0, 2, GL_FLOAT, 0, xx UI_VERTEX_BYTES, xx 0);
glEnableVertexAttribArray(0);
// uv (2 floats)
glVertexAttribPointer(1, 2, GL_FLOAT, 0, xx UI_VERTEX_BYTES, xx 8);
glEnableVertexAttribArray(1);
// color (4 floats)
glVertexAttribPointer(2, 4, GL_FLOAT, 0, xx UI_VERTEX_BYTES, xx 16);
glEnableVertexAttribArray(2);
// params: corner_radius, border_width, rect_w, rect_h
glVertexAttribPointer(3, 4, GL_FLOAT, 0, xx UI_VERTEX_BYTES, xx 32);
glEnableVertexAttribArray(3);
glBindVertexArray(0);
// 1x1 white texture for solid rects
self.white_texture = create_white_texture();
}
} }
begin :: (self: *UIRenderer, width: f32, height: f32, font_texture: u32) { begin :: (self: *UIRenderer, width: f32, height: f32, font_texture: u32) {
@@ -78,15 +99,26 @@ UIRenderer :: struct {
self.current_texture = font_texture; self.current_texture = font_texture;
self.draw_calls = 0; self.draw_calls = 0;
// Set up GL state once for the entire frame
glUseProgram(self.shader);
proj := Mat4.ortho(0.0, width, height, 0.0, -1.0, 1.0); proj := Mat4.ortho(0.0, width, height, 0.0, -1.0, 1.0);
glUniformMatrix4fv(self.proj_loc, 1, 0, proj.data);
glUniform1i(self.tex_loc, 0); if self.has_gpu {
glActiveTexture(GL_TEXTURE0); // Pipeline state + vertex buffer + projection + initial texture.
glBindTexture(GL_TEXTURE_2D, font_texture); // Metal blend mode + scissor-cleared defaults are baked into
glBindVertexArray(self.vao); // the pipeline state, so no per-frame glEnable/glDisable.
glBindBuffer(GL_ARRAY_BUFFER, self.vbo); self.gpu.set_shader(self.mtl_shader);
self.gpu.set_vertex_buffer(self.mtl_vbuf);
self.gpu.set_vertex_constants(1, xx proj.data, 64);
self.gpu.set_texture(0, font_texture);
} else {
// GL: bind everything for the frame.
glUseProgram(self.shader);
glUniformMatrix4fv(self.proj_loc, 1, 0, proj.data);
glUniform1i(self.tex_loc, 0);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, font_texture);
glBindVertexArray(self.vao);
glBindBuffer(GL_ARRAY_BUFFER, self.vbo);
}
} }
bind_texture :: (self: *UIRenderer, tex: u32) { bind_texture :: (self: *UIRenderer, tex: u32) {
@@ -202,18 +234,33 @@ UIRenderer :: struct {
} }
case .clip_push: { case .clip_push: {
self.flush(); self.flush();
glEnable(GL_SCISSOR_TEST);
dpi := self.dpi_scale; dpi := self.dpi_scale;
glScissor( if self.has_gpu {
xx (node.frame.origin.x * dpi), // Metal: pixel coords, top-left origin (no Y flip).
xx ((self.screen_height - node.frame.origin.y - node.frame.size.height) * dpi), self.gpu.set_scissor(
xx (node.frame.size.width * dpi), xx (node.frame.origin.x * dpi),
xx (node.frame.size.height * dpi) xx (node.frame.origin.y * dpi),
); xx (node.frame.size.width * dpi),
xx (node.frame.size.height * dpi),
);
} else {
// GL: pixel coords, bottom-left origin — flip Y.
glEnable(GL_SCISSOR_TEST);
glScissor(
xx (node.frame.origin.x * dpi),
xx ((self.screen_height - node.frame.origin.y - node.frame.size.height) * dpi),
xx (node.frame.size.width * dpi),
xx (node.frame.size.height * dpi)
);
}
} }
case .clip_pop: { case .clip_pop: {
self.flush(); self.flush();
glDisable(GL_SCISSOR_TEST); if self.has_gpu {
self.gpu.disable_scissor();
} else {
glDisable(GL_SCISSOR_TEST);
}
} }
case .opacity_push: {} case .opacity_push: {}
case .opacity_pop: {} case .opacity_pop: {}
@@ -225,13 +272,22 @@ UIRenderer :: struct {
flush :: (self: *UIRenderer) { flush :: (self: *UIRenderer) {
if self.vertex_count == 0 { return; } if self.vertex_count == 0 { return; }
// Only bind the current texture (program, projection, VAO already bound in begin())
glBindTexture(GL_TEXTURE_2D, self.current_texture);
upload_size : s64 = self.vertex_count * UI_VERTEX_BYTES; upload_size : s64 = self.vertex_count * UI_VERTEX_BYTES;
// Use glBufferData to orphan the old buffer and avoid GPU sync stalls
glBufferData(GL_ARRAY_BUFFER, xx upload_size, self.vertices, GL_DYNAMIC_DRAW); if self.has_gpu {
glDrawArrays(GL_TRIANGLES, 0, xx self.vertex_count); // Mirror the GL path: bind current texture before drawing.
// current_texture may have changed since the last flush.
self.gpu.set_texture(0, self.current_texture);
self.gpu.update_buffer(self.mtl_vbuf, xx self.vertices, upload_size);
self.gpu.draw_triangles(0, xx self.vertex_count);
} else {
// Only re-bind the current texture (program, projection, VAO
// already bound in begin()). glBufferData orphans the old buffer
// to avoid GPU sync stalls.
glBindTexture(GL_TEXTURE_2D, self.current_texture);
glBufferData(GL_ARRAY_BUFFER, xx upload_size, self.vertices, GL_DYNAMIC_DRAW);
glDrawArrays(GL_TRIANGLES, 0, xx self.vertex_count);
}
self.vertex_count = 0; self.vertex_count = 0;
self.draw_calls += 1; self.draw_calls += 1;
@@ -458,3 +514,87 @@ void main() {
} }
} }
GLSL; GLSL;
// --- Metal (MSL) — single library with vmain/fmain entry points ---
//
// `packed_float2 / packed_float4` keep the 12-float interleaved vertex
// layout (pos2 / uv2 / color4 / params4 = 48 bytes) without padding —
// MSL's default `float4` has 16-byte alignment and would force a 64-byte
// struct (see examples/63-metal-clear.sx for the gotcha).
//
// Uniform passing: GL uses `glUniformMatrix4fv("uProj", proj)`; Metal
// receives the projection via `setVertexBytes:length:atIndex:1` (slot 0
// is the vertex buffer). Texture binding goes through
// `setFragmentTexture:atIndex:0`.
UI_MSL_SRC :: #string MSL
#include <metal_stdlib>
using namespace metal;
struct UIVertex {
packed_float2 pos;
packed_float2 uv;
packed_float4 color;
packed_float4 params;
};
struct VOut {
float4 position [[position]];
float2 uv;
float4 color;
float4 params;
};
vertex VOut vmain(uint vid [[vertex_id]],
constant UIVertex* verts [[buffer(0)]],
constant float4x4& proj [[buffer(1)]]) {
UIVertex v = verts[vid];
VOut o;
o.position = proj * float4(v.pos, 0.0, 1.0);
o.uv = float2(v.uv);
o.color = float4(v.color);
o.params = float4(v.params);
return o;
}
static float roundedBoxSDF(float2 center, float2 half_size, float radius) {
float2 q = abs(center) - half_size + float2(radius);
return length(max(q, float2(0.0))) + min(max(q.x, q.y), 0.0) - radius;
}
fragment float4 fmain(VOut in [[stage_in]],
texture2d<float> tex [[texture(0)]]) {
constexpr sampler s(coord::normalized, address::clamp_to_edge, filter::linear);
float mode = in.params.x;
float border = in.params.y;
float2 rectSize = in.params.zw;
if (mode < -1.5) {
// Image mode (mode == -2.0): sample texture
return tex.sample(s, in.uv) * in.color;
} else if (mode < 0.0) {
// Text mode (mode == -1.0): sample glyph atlas .r as alpha
float alpha = tex.sample(s, in.uv).r;
float ew = fwidth(alpha) * 0.7;
alpha = smoothstep(0.5 - ew, 0.5 + ew, alpha);
return float4(in.color.rgb, in.color.a * pow(alpha, 0.9));
} else if (mode > 0.0 || border > 0.0) {
// Rounded rect: SDF alpha, vertex color only
float2 half_size = rectSize * 0.5;
float2 center = (in.uv - float2(0.5)) * rectSize;
float dist = roundedBoxSDF(center, half_size, mode);
float aa = fwidth(dist);
float alpha = 1.0 - smoothstep(-aa, aa, dist);
if (border > 0.0) {
float inner = roundedBoxSDF(center, half_size - float2(border), max(mode - border, 0.0));
float border_alpha = smoothstep(-aa, aa, inner);
alpha = alpha * max(border_alpha, 0.0);
}
return float4(in.color.rgb, in.color.a * alpha);
} else {
// Plain rect: vertex color only
return in.color;
}
}
MSL;