video: pre-ship review fixes for the FFmpeg renderer

Six prod-blocking issues and three correctness improvements from an independent code review of 7243ef7. Verified on Huawei Mate 20 (EMUI 11) — playback, rotation, replay-after-end all still work. - EAGAIN on avcodec_send_packet was silently dropping the input packet (SimpleDecoder consumed it before we could retry). ffmpeg_jni.cc now caches a frame drained from the output queue into pending_frame, retries the send, and the next ffmpegVideoReceiveFrame emits the cached frame in order before pulling a new one. - C.TIME_UNSET == Long.MIN_VALUE == AV_NOPTS_VALUE was an undocumented coincidence between two upstreams. Gate it explicitly so a future Media3 sentinel change can't scramble display-order PTS recovery. - supportsFormat parses the H.264 profile from format.codecs and rejects non-8-bit profiles (High 10 / High 4:2:2 / High 4:4:4). These initialise libavcodec cleanly and only fail at the first receive — too late for ExoPlayer to fall through to MediaCodec. Rejecting upfront lets the platform decoder pick them up. - build_ffmpeg.sh wraps the whole run in a portable mkdir-based lock and clones into a staging dir + atomic rename with a sentinel file. Concurrent Gradle daemons no longer corrupt each other; an interrupted clone leaves no usable state for the next run to mistake as finished. - FfmpegOutputSurface and VideoCompositor both used to call eglTerminate(EGL_DEFAULT_DISPLAY) on teardown. That display is process-global and shared — the first teardown killed the other consumer's surface. Drop both calls; per-context cleanup + eglReleaseThread is sufficient. Likely cause of any "frozen surface after second video" report. - Rotation swap in renderOutputBuffer mutates the public outputBuffer.width/height. Bound it to SURFACE_YUV output mode via a currentOutputMode tracker; YUV-mode consumers (VideoDecoderOutputBufferRenderer.setOutputBuffer) read width/height expecting CODED dims that match yuvStrides[0] — the swap would walk chroma off the end of the allocation. - Fragment shader bumped from mediump to highp. The limited-range pre-scale (y - 16/255) * (255/219) was at risk of quantizing through 10-bit mediump and banding dark gradients on older Mali / Adreno parts. highp on the fragment is universally supported on GLES2 implementations Android ships post-2014. - Threading config comment was wrong about what FF_THREAD_SLICE does for H.264. Replace with the accurate explanation (slice threading degenerates to single-threaded on iOS's single-slice encodes; FRAME threading is rejected because of the input-side latency, not because libavcodec doesn't support it). - FfmpegVideoDecoder header documents two known limits the review surfaced but that don't have a clean fix at this layer: EOS tail-frame loss (~500 ms truncation on first play-through only; replay is fine because flush_buffers clears libavcodec) and the size-based colorspace heuristic mislabelling iPhone 6/7-era unspecified-metadata BT.601 1080p clips as BT.709.
2026-05-29 07:33:20 +03:00
parent 7243ef7de4
commit c0d55babf3
6 changed files with 206 additions and 36 deletions
--- a/android/ffmpeg/build_ffmpeg.sh
+++ b/android/ffmpeg/build_ffmpeg.sh
@@ -42,20 +42,64 @@ esac
 mkdir -p "$WORK_DIR" "$OUTPUT_DIR"
 cd "$WORK_DIR"

+# Serialise concurrent invocations on the shared WORK_DIR so two
+# Gradle daemons (or two parallel app builds depending on this AAR
+# via the same checkout) can't race on clone / cmake / ninja.
+# `mkdir` is atomic per POSIX — first caller wins. `flock` would be
+# nicer but macOS doesn't ship it. A stale lock from a killed prior
+# run (>30 min old) is broken automatically. The trap clears the
+# lock on normal exit.
+LOCK_DIR="$WORK_DIR/.build-lock"
+if [[ -d "$LOCK_DIR" ]]; then
+  if find "$LOCK_DIR" -maxdepth 0 -mmin +30 2>/dev/null | grep -q .; then
+    echo "[ffmpeg-build] removing stale lock (>30 min old)"
+    rm -rf "$LOCK_DIR"
+  fi
+fi
+LOCK_WAIT_SECS=0
+while ! mkdir "$LOCK_DIR" 2>/dev/null; do
+  if [[ "$LOCK_WAIT_SECS" -ge 1800 ]]; then
+    echo "[ffmpeg-build] timed out waiting for $LOCK_DIR" >&2
+    exit 1
+  fi
+  if [[ "$LOCK_WAIT_SECS" -eq 0 ]]; then
+    echo "[ffmpeg-build] another build in progress at $LOCK_DIR, waiting..."
+  fi
+  sleep 5
+  LOCK_WAIT_SECS=$((LOCK_WAIT_SECS + 5))
+done
+trap 'rm -rf "$LOCK_DIR"' EXIT
+
+# Sentinel files mark a clone as fully complete so an interrupted
+# clone (network drop, ^C, OOM kill) doesn't leave a half-populated
+# directory the next run mistakes for a finished checkout. Clone
+# into a staging dir, then atomic-rename into place once the
+# sentinel is written.
+clone_if_missing() {
+  local target="$1"
+  local sentinel="$target/.ux-ffmpeg-build-complete"
+  local tag="$2"
+  local url="$3"
+  local label="$4"
+  if [[ -f "$sentinel" ]]; then
+    return
+  fi
+  # Stale partial clone — wipe before re-clone.
+  rm -rf "$target" "${target}.staging"
+  echo "[ffmpeg-build] cloning $label @${tag}"
+  git clone --depth 1 --branch "$tag" "$url" "${target}.staging"
+  touch "${target}.staging/.ux-ffmpeg-build-complete"
+  mv "${target}.staging" "$target"
+}
+
 # 1. Upstream sources — clone once, reuse on subsequent runs.
 MEDIA3_DIR="$WORK_DIR/media3"
-if [[ ! -d "$MEDIA3_DIR" ]]; then
-  echo "[ffmpeg-build] cloning Media3 @${MEDIA3_TAG}"
-  git clone --depth 1 --branch "$MEDIA3_TAG" \
-    https://github.com/androidx/media.git "$MEDIA3_DIR"
-fi
+clone_if_missing "$MEDIA3_DIR" "$MEDIA3_TAG" \
+  "https://github.com/androidx/media.git" "Media3"

 FFMPEG_DIR="$MEDIA3_DIR/libraries/decoder_ffmpeg/src/main/jni/ffmpeg"
-if [[ ! -d "$FFMPEG_DIR" ]]; then
-  echo "[ffmpeg-build] cloning FFmpeg @${FFMPEG_TAG}"
-  git clone --depth 1 --branch "$FFMPEG_TAG" \
-    https://git.ffmpeg.org/ffmpeg.git "$FFMPEG_DIR"
-fi
+clone_if_missing "$FFMPEG_DIR" "$FFMPEG_TAG" \
+  "https://git.ffmpeg.org/ffmpeg.git" "FFmpeg"

 # 2. Drop our extended JNI source + CMake config over the upstream copies
 #    so the build produces a video-capable libffmpegJNI.so.
--- a/android/ffmpeg/ffmpeg_jni.cc
+++ b/android/ffmpeg/ffmpeg_jni.cc
@@ -116,10 +116,14 @@ static int transformError(int err) {
 // Decoder state held across JNI calls; the long handle returned by
 // videoInitialize is a pointer to one of these. AVCodecContext alone
 // isn't enough because we want a reusable AVFrame to avoid per-decode
-// allocation churn.
+// allocation churn, plus a pending_frame slot to cache frames pulled
+// during a send-side EAGAIN drain so the next receiveFrame call emits
+// them in order instead of losing them.
 struct UxFfmpegVideoContext {
  AVCodecContext* codec_ctx = nullptr;
  AVFrame* frame = nullptr;
+  AVFrame* pending_frame = nullptr;
+  bool has_pending = false;
 };

 static void releaseContext(UxFfmpegVideoContext* ctx) {
@@ -127,6 +131,9 @@ static void releaseContext(UxFfmpegVideoContext* ctx) {
  if (ctx->frame) {
    av_frame_free(&ctx->frame);
  }
+  if (ctx->pending_frame) {
+    av_frame_free(&ctx->pending_frame);
+  }
  if (ctx->codec_ctx) {
    avcodec_free_context(&ctx->codec_ctx);
  }
@@ -223,11 +230,15 @@ VIDEO_DECODER_FUNC(jlong, ffmpegVideoInitialize, jstring codecName,
  }

  ctx->codec_ctx->thread_count = threads > 0 ? threads : 0;
-  // Slice threading only. FRAME threading buffers thread_count
-  // input frames before producing output; that extra latency
-  // pushes frames past their PTS deadline and ExoPlayer drops
-  // them, leaving render rate well below source rate. Slice
-  // threading gives parallelism without the input-side delay.
+  // FF_THREAD_SLICE only. FRAME threading buffers thread_count
+  // input frames before producing output, pushing decoded frames
+  // past their PTS deadline and causing ExoPlayer to drop them.
+  // Most iOS-captured H.264 emits one slice per frame, so slice
+  // threading degenerates to single-threaded; libavcodec's H.264
+  // decoder does not auto-promote SLICE-only to FRAME, so we
+  // accept modest throughput in exchange for low latency. 480p
+  // decode is ~2 ms per frame single-threaded on any modern ARM
+  // core anyway.
  ctx->codec_ctx->thread_type = FF_THREAD_SLICE;
  ctx->codec_ctx->err_recognition = AV_EF_IGNORE_ERR;
  // PTS values are passed in microseconds (Media3's native unit),
@@ -245,11 +256,13 @@ VIDEO_DECODER_FUNC(jlong, ffmpegVideoInitialize, jstring codecName,
  }

  ctx->frame = av_frame_alloc();
-  if (!ctx->frame) {
+  ctx->pending_frame = av_frame_alloc();
+  if (!ctx->frame || !ctx->pending_frame) {
    LOGE("ffmpegVideoInitialize: av_frame_alloc failed");
    releaseContext(ctx);
    return 0L;
  }
+  ctx->has_pending = false;
  return (jlong)ctx;
 }

@@ -265,6 +278,10 @@ VIDEO_DECODER_FUNC(jint, ffmpegVideoSendPacket, jlong handle, jobject inputData,
  }
  UxFfmpegVideoContext* ctx = (UxFfmpegVideoContext*)handle;
  uint8_t* buf = (uint8_t*)env->GetDirectBufferAddress(inputData);
+  if (!buf) {
+    LOGE("ffmpegVideoSendPacket: GetDirectBufferAddress null");
+    return VIDEO_DECODER_ERROR_OTHER;
+  }
  AVPacket* pkt = av_packet_alloc();
  if (!pkt) {
    LOGE("ffmpegVideoSendPacket: av_packet_alloc failed");
@@ -272,11 +289,34 @@ VIDEO_DECODER_FUNC(jint, ffmpegVideoSendPacket, jlong handle, jobject inputData,
  }
  pkt->data = buf;
  pkt->size = inputSize;
-  pkt->pts = (int64_t)ptsUs;
+  // Media3's C.TIME_UNSET is Long.MIN_VALUE which by happy coincidence
+  // equals libavcodec's AV_NOPTS_VALUE; gate it explicitly so a future
+  // Media3 sentinel change doesn't silently scramble PTS recovery.
+  pkt->pts = (ptsUs == INT64_MIN) ? AV_NOPTS_VALUE : (int64_t)ptsUs;
  pkt->dts = AV_NOPTS_VALUE;
+
+  // Per libavcodec contract, EAGAIN on send means the packet was NOT
+  // consumed and the caller must drain output before re-sending. We
+  // can't return EAGAIN to SimpleDecoder (its 1-in / 1-out model
+  // would consume the input buffer and lose the packet), so when the
+  // queue is full we drain one frame into pending_frame and retry.
+  // pending_frame is then emitted by the next ffmpegVideoReceiveFrame
+  // call before pulling a new one from libavcodec.
  int result = avcodec_send_packet(ctx->codec_ctx, pkt);
+  if (result == AVERROR(EAGAIN) && !ctx->has_pending) {
+    int recv = avcodec_receive_frame(ctx->codec_ctx, ctx->pending_frame);
+    if (recv == 0) {
+      ctx->has_pending = true;
+      result = avcodec_send_packet(ctx->codec_ctx, pkt);
+    } else {
+      logError("send-EAGAIN drain receive", recv);
+    }
+  }
  av_packet_free(&pkt);
  if (result == AVERROR(EAGAIN)) {
+    // Pending slot already full; drop this packet rather than block.
+    // Should never happen at steady state given numOutputBuffers=16.
+    LOGE("ffmpegVideoSendPacket: queue full and pending slot occupied");
    return VIDEO_DECODER_READ_AGAIN;
  }
  if (result < 0) {
@@ -298,16 +338,25 @@ VIDEO_DECODER_FUNC(jint, ffmpegVideoReceiveFrame, jlong handle,
    return VIDEO_DECODER_ERROR_OTHER;
  }
  UxFfmpegVideoContext* ctx = (UxFfmpegVideoContext*)handle;
-  int result = avcodec_receive_frame(ctx->codec_ctx, ctx->frame);
-  if (result == AVERROR(EAGAIN) || result == AVERROR_EOF) {
-    return VIDEO_DECODER_READ_AGAIN;
-  }
-  if (result < 0) {
-    logError("avcodec_receive_frame", result);
-    return transformError(result);
-  }
-
  AVFrame* f = ctx->frame;
+  // If a frame was drained into pending_frame to recover from a
+  // send-side EAGAIN, emit it before pulling the next one — keeps
+  // display-order continuity even when libavcodec backpressures the
+  // input queue.
+  if (ctx->has_pending) {
+    av_frame_unref(f);
+    av_frame_move_ref(f, ctx->pending_frame);
+    ctx->has_pending = false;
+  } else {
+    int result = avcodec_receive_frame(ctx->codec_ctx, f);
+    if (result == AVERROR(EAGAIN) || result == AVERROR_EOF) {
+      return VIDEO_DECODER_READ_AGAIN;
+    }
+    if (result < 0) {
+      logError("avcodec_receive_frame", result);
+      return transformError(result);
+    }
+  }
  // Only planar 4:2:0 YUV is supported by VideoDecoderOutputBuffer's
  // 3-plane layout. iOS H.264 produces YUV420P (limited range) or
  // YUVJ420P (full range); identical memory layout, only range
@@ -398,6 +447,10 @@ VIDEO_DECODER_FUNC(jint, ffmpegVideoReceiveFrame, jlong handle,
 VIDEO_DECODER_FUNC(void, ffmpegVideoFlush, jlong handle) {
  if (!handle) return;
  UxFfmpegVideoContext* ctx = (UxFfmpegVideoContext*)handle;
+  if (ctx->has_pending) {
+    av_frame_unref(ctx->pending_frame);
+    ctx->has_pending = false;
+  }
  avcodec_flush_buffers(ctx->codec_ctx);
 }

--- a/android/src/main/java/io/swipelab/ux/video/ffmpeg/FfmpegOutputSurface.java
+++ b/android/src/main/java/io/swipelab/ux/video/ffmpeg/FfmpegOutputSurface.java
@@ -98,8 +98,13 @@ final class FfmpegOutputSurface {
  // yuvj420p) and limited-range conversion. uSampleScale rescales the
  // horizontal texture coordinate to skip the right-side padding that
  // FFmpeg's SIMD-aligned linesize introduces (yStride >= width).
+  // highp on the fragment so the limited-range pre-scale
+  // `(y - 16/255) * (255/219)` doesn't quantize through 10-bit-ish
+  // mediump precision and band dark gradients on older Mali / Adreno
+  // parts. highp on a fragment shader is universally supported on
+  // GLES2 implementations Android ships post-2014.
  private static final String FRAGMENT_SHADER =
-      "precision mediump float;\n"
+      "precision highp float;\n"
      + "varying vec2 vTex;\n"
      + "uniform sampler2D uY;\n"
      + "uniform sampler2D uU;\n"
@@ -409,7 +414,11 @@ final class FfmpegOutputSurface {
        eglContext = EGL14.EGL_NO_CONTEXT;
      }
      EGL14.eglReleaseThread();
-      EGL14.eglTerminate(eglDisplay);
+      // NB: do NOT eglTerminate(EGL_DEFAULT_DISPLAY) here — the
+      // display is shared with VideoCompositor's EGL context, and
+      // tearing it down would silently kill the other consumer's
+      // surface. eglDestroyContext + eglReleaseThread is sufficient
+      // to clean up our share.
      eglDisplay = EGL14.EGL_NO_DISPLAY;
    }
    quadBuffer = null;
--- a/android/src/main/java/io/swipelab/ux/video/ffmpeg/FfmpegVideoDecoder.java
+++ b/android/src/main/java/io/swipelab/ux/video/ffmpeg/FfmpegVideoDecoder.java
@@ -26,6 +26,27 @@ import java.util.List;
 * directly onto libavcodec's avcodec_send_packet / avcodec_receive_frame
 * lifecycle so we can drain multiple reordered frames out of a single
 * input packet.
+ *
+ * <h3>Known limits</h3>
+ * <ul>
+ *   <li><b>EOS trailing frames.</b> Media3's {@code SimpleDecoder}
+ *   base class special-cases the end-of-stream input buffer and
+ *   never invokes our {@code decode()} for it, so libavcodec's
+ *   reorder buffer (~16 frames for iOS H.264 High@3.1) is never
+ *   drained with {@code avcodec_send_packet(NULL)}. The last ~500 ms
+ *   of a clip can be truncated on first play-through. Replay via
+ *   {@code REPEAT_MODE_ONE} or {@code seekTo(0)} hits
+ *   {@code avcodec_flush_buffers} which clears the queue, so the
+ *   second play and onwards are full-length.</li>
+ *   <li><b>Colorspace heuristic.</b> When the bitstream's
+ *   {@code colorspace}/{@code primaries}/{@code transfer} are all
+ *   unspecified we fall back to a size-based guess (BT.709 for >=
+ *   720p, BT.601 below). iPhone 6/7-era 1080p clips that recorded
+ *   BT.601 with unspecified metadata get mislabelled — skin tones
+ *   are slightly oversaturated. Modern iOS sets {@code bt709}
+ *   explicitly so trusting the bitstream is correct for almost
+ *   everything in circulation today.</li>
+ * </ul>
 */
@UnstableApi
 public final class FfmpegVideoDecoder
--- a/android/src/main/java/io/swipelab/ux/video/ffmpeg/FfmpegVideoRenderer.java
+++ b/android/src/main/java/io/swipelab/ux/video/ffmpeg/FfmpegVideoRenderer.java
@@ -63,6 +63,7 @@ public final class FfmpegVideoRenderer extends DecoderVideoRenderer {
  private int surfaceWidth = -1;
  private int surfaceHeight = -1;
  private int surfaceRotation = 0;
+  private @C.VideoOutputMode int currentOutputMode = C.VIDEO_OUTPUT_MODE_NONE;

  public FfmpegVideoRenderer(
      long allowedJoiningTimeMs,
@@ -111,6 +112,16 @@ public final class FfmpegVideoRenderer extends DecoderVideoRenderer {
    if (!FfmpegLibrary.supportsFormat(mime)) {
      return RendererCapabilities.create(C.FORMAT_UNSUPPORTED_SUBTYPE);
    }
+    if (!supports8BitH264Profile(format.codecs)) {
+      // The YUV path only handles planar 4:2:0 8-bit (yuv420p /
+      // yuvj420p). High 10 / High 4:2:2 / High 4:4:4 / Main 10
+      // streams initialise libavcodec cleanly and only fail at the
+      // first receive — by then ExoPlayer has committed to this
+      // renderer and can't fall back. Reject upfront so the platform
+      // MediaCodec path (which often handles these via hardware) gets
+      // selected instead.
+      return RendererCapabilities.create(C.FORMAT_UNSUPPORTED_SUBTYPE);
+    }
    if (format.cryptoType != C.CRYPTO_TYPE_NONE) {
      return RendererCapabilities.create(C.FORMAT_UNSUPPORTED_DRM);
    }
@@ -118,6 +129,26 @@ public final class FfmpegVideoRenderer extends DecoderVideoRenderer {
        C.FORMAT_HANDLED, ADAPTIVE_SEAMLESS, TUNNELING_NOT_SUPPORTED);
  }

+  /**
+   * Parses the H.264 profile from an avc1 codec string (e.g.
+   * {@code avc1.640028}). Accepts the 8-bit YUV 4:2:0 profiles —
+   * Baseline (0x42), Main (0x4D), Extended (0x58), High (0x64) —
+   * and rejects everything else. When the codec string is missing or
+   * malformed we permit it: the worst case is a hard fail at decode
+   * time, which is no worse than today's behaviour.
+   */
+  private static boolean supports8BitH264Profile(@Nullable String codecs) {
+    if (codecs == null) return true;
+    String lower = codecs.toLowerCase();
+    if (!lower.startsWith("avc1.") || lower.length() < 11) return true;
+    try {
+      int profile = Integer.parseInt(lower.substring(5, 7), 16);
+      return profile == 0x42 || profile == 0x4D || profile == 0x58 || profile == 0x64;
+    } catch (NumberFormatException e) {
+      return true;
+    }
+  }
+
  @Override
  protected FfmpegVideoDecoder createDecoder(Format format, @Nullable CryptoConfig cryptoConfig)
      throws FfmpegDecoderException {
@@ -134,16 +165,22 @@ public final class FfmpegVideoRenderer extends DecoderVideoRenderer {

  /// Pre-swap buffer dims for 90°/270° rotated streams so the
  /// {@code maybeNotifyVideoSizeChanged} call inside the base
-  /// renderOutputBuffer reports DISPLAY-orientation dimensions (matching
-  /// what MediaCodecVideoRenderer does for the hardware path). Without
-  /// this swap, portrait iOS videos report their coded landscape size
-  /// and the downstream compositor lays out the Flutter texture
-  /// rotated.
+  /// renderOutputBuffer reports DISPLAY-orientation dimensions
+  /// (matching what MediaCodecVideoRenderer does for the hardware
+  /// path). Without this swap, portrait iOS videos report their
+  /// coded landscape size and the downstream compositor lays out the
+  /// Flutter texture rotated. The swap is bounded to SURFACE_YUV
+  /// output mode because YUV-mode consumers
+  /// ({@code VideoDecoderOutputBufferRenderer.setOutputBuffer}) read
+  /// {@code buffer.width}/{@code height} expecting CODED dimensions
+  /// that match {@code yuvStrides[0]} — swapping there would walk
+  /// the chroma planes off the end of the allocation.
  @Override
  protected void renderOutputBuffer(
      VideoDecoderOutputBuffer outputBuffer, long presentationTimeUs, Format outputFormat)
      throws DecoderException {
-    if (outputFormat != null
+    if (currentOutputMode == C.VIDEO_OUTPUT_MODE_SURFACE_YUV
+        && outputFormat != null
        && (outputFormat.rotationDegrees == 90 || outputFormat.rotationDegrees == 270)
        && outputBuffer.width != outputBuffer.height) {
      int tmp = outputBuffer.width;
@@ -183,6 +220,7 @@ public final class FfmpegVideoRenderer extends DecoderVideoRenderer {

  @Override
  protected void setDecoderOutputMode(@C.VideoOutputMode int outputMode) {
+    currentOutputMode = outputMode;
    if (decoder != null) {
      decoder.setOutputMode(outputMode);
    }
@@ -204,6 +242,7 @@ public final class FfmpegVideoRenderer extends DecoderVideoRenderer {
  protected void onDisabled() {
    releaseOutputSurface();
    decoder = null;
+    currentOutputMode = C.VIDEO_OUTPUT_MODE_NONE;
    super.onDisabled();
  }

--- a/android/src/main/kotlin/io/swipelab/ux/video/VideoCompositor.kt
+++ b/android/src/main/kotlin/io/swipelab/ux/video/VideoCompositor.kt
@@ -387,7 +387,11 @@ internal class VideoCompositor(
        EGL14.eglDestroyContext(eglDisplay, eglContext)
        eglContext = EGL14.EGL_NO_CONTEXT
      }
-      EGL14.eglTerminate(eglDisplay)
+      EGL14.eglReleaseThread()
+      // NB: do NOT eglTerminate(EGL_DEFAULT_DISPLAY) here — the
+      // display is shared with FfmpegOutputSurface's EGL context,
+      // and tearing it down would silently kill the other consumer's
+      // window surface. Per-context cleanup above is enough.
      eglDisplay = EGL14.EGL_NO_DISPLAY
    }
  }