Skip to content

Instantly share code, notes, and snippets.

@StoneyEagle
Created April 17, 2026 18:04
Show Gist options
  • Select an option

  • Save StoneyEagle/391a1820129850733a514902c20edb97 to your computer and use it in GitHub Desktop.

Select an option

Save StoneyEagle/391a1820129850733a514902c20edb97 to your computer and use it in GitHub Desktop.
NoMercy Encoder v3 — functionality, processes, and capabilities report (11 pages)

NoMercy Encoder v3 — Overview

The NoMercy media server ships with a modern encoder that handles the entire pipeline from "here's a source file" to "here's an adaptive streaming ladder your phone can play." This report walks through what it supports, how it protects you from misconfiguration, and what makes it different from a raw ffmpeg wrapper.

At a glance

Area Coverage
Video codecs H.264 (6 encoders), H.265 / HEVC (6), AV1 (7), VP9 (3)
Audio codecs AAC, Opus, FLAC, AC3, EAC3, MP3, Vorbis, TrueHD, DTS
Subtitle codecs WebVTT, ASS, SRT + OCR for PGS / DVD / DVB bitmap subs
Containers HLS, DASH, MP4, MKV, MP3, FLAC, OGG (single-file audio)
Encoder strategies 10 (single-pass + two-pass variants per container)
Built-in presets 10 (Chromecast, Apple TV 4K HDR, Mobile, 4K Archival, Anime, plus 5 general-purpose)
Hardware encoders NVIDIA NVENC, AMD AMF, Intel QSV + VAAPI, Apple VideoToolbox
HDR support HDR10 passthrough, Dolby Vision, HDR10+, HDR→SDR tonemap via libplacebo / OpenCL / zscale
Distribution Multi-machine encoding over HTTP with HMAC auth
Live transcode On-the-fly HLS for clients that can't play the source
Disc ripping Bluray + DVD scanner and stream-copy ripper
Validator 30+ pre-encode checks that catch user mistakes before an encode starts

Target audience

  • Home server owners who want Netflix-quality adaptive streaming from their own library without having to learn ffmpeg
  • Small-studio editors who need deterministic encodes from a preset library (Apple TV 4K HDR, Chromecast-compatible, etc.)
  • Prosumers with multiple machines who want to distribute encoding across a desktop + workstation + NAS

What this report covers

This is a multi-page tour. Pages are linked to from the file list on the left (in the Gist UI); each page focuses on one subsystem.

  1. Overview — you are here
  2. Architecture — pipeline stages, strategies, DI model
  3. Codecs and formats — every supported codec, every encoder family, container matrix
  4. Profiles and presets — profile structure, built-in preset library, preset inheritance
  5. Safety net — the 30+ validator checks + API endpoints for pre-encode guidance
  6. Hardware and HDR — per-GPU benchmarking, HDR passthrough, Dolby Vision handling
  7. Content analysis — crop detection, intro/outro fingerprinting, subtitle OCR, speech transcription
  8. Subtitles and DRM — text vs bitmap routing, burn-in, AES-128 HLS encryption
  9. Live transcode — on-the-fly playback for clients that can't handle the source
  10. Disc ripping — Bluray / DVD import
  11. Distributed encoding — multi-machine coordination, signed HTTP transport, health tracking, source file streaming, live progress

Design principles

These show up repeatedly across the subsystems:

Fail at profile save, not at encode time. If your settings will produce a broken encode, you hear about it the moment you hit save — not thirty minutes into a 4K job. The validator catches 30+ classes of common misconfiguration with human-readable fix guidance.

Accept any input, produce a curated set of outputs. The encoder reads anything ffmpeg can parse — exotic containers, weird codecs, malformed headers, VFR streams. Output is restricted to a deliberate subset of well-supported formats so you never ship an encode that won't play on your target devices.

Hardware is opt-in, not magic. When you have a GPU, the encoder detects it, benchmarks it against your CPU, and picks the faster path. When you don't, software encoding runs with the same quality profile — you just wait longer. No surprise hardware fallback silently producing worse output.

Every decision is observable. Crop detector ran? You see that in the plan log. ABR ladder auto-generated? The generated rungs are in the progress stream. Remote worker retried? The dashboard shows the worker health history.

No destructive defaults. Source files are never modified. Outputs go into a separate directory tree. Failed encodes clean up after themselves. Resume-after-crash works because of explicit checkpoint files, not implicit "hope nothing moved."

Architecture

The encoder is built as a pipeline of stages wrapped by a strategy layer. Each layer has a narrow, testable job.

The big picture

┌──────────────────────────────────────────────────────────────────────┐
│                        EncodingOrchestrator                          │
│  Reads the profile format + encode mode, picks an IEncodingStrategy  │
└──────────────────────┬───────────────────────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────────────────────┐
│                       IEncodingStrategy                              │
│  HlsSinglePassStrategy, HlsTwoPassStrategy, DashSinglePassStrategy,  │
│  DashTwoPassStrategy, Mp4SinglePassStrategy, Mp4TwoPassStrategy,     │
│  MkvStrategy, Mp3Strategy, FlacStrategy, OggStrategy                 │
└──────────────────────┬───────────────────────────────────────────────┘
                       ▼
┌───────────────┬──────────────┬──────────────┬───────────────────────┐
│  Validate     │  Analyze     │  Plan        │  Build → Execute →    │
│  stage        │  stage       │  stage       │  Finalize stages      │
└───────────────┴──────────────┴──────────────┴───────────────────────┘
                                    │
                       ┌────────────▼────────────┐
                       │   IWorkerDispatcher     │
                       │  Local / Remote HTTP    │
                       └─────────────────────────┘

Pipeline stages

Every encode walks the same five stages in order. A strategy can skip or short-circuit stages when it doesn't need them (audio-only encodes skip video planning, etc.).

1. Validate

Runs the ProfileValidator against the incoming profile. Errors at this stage (level vs dimensions, codec/container mismatch, missing outputs) fail the encode immediately with a human-readable message. Warnings log but don't block — user is on a path that works but probably isn't what they meant (preset mismatch, inverted ABR ladder, etc.).

Covered in detail in Safety net.

2. Analyze

Runs ffprobe via MediaAnalyzer to build a MediaInfo record: every video / audio / subtitle / attachment stream with codec, dimensions, bit depth, color space, language tags, chapters, Dolby Vision side-data. Accepts anything ffprobe can parse — no codec allowlist on the input side.

3. Plan

Takes the MediaInfo + EncodingProfile and builds an OutputPlan: which video variants to encode, which audio streams to pass through or transcode, which subtitles to extract or burn in, which filters to apply (crop detection, HDR tonemap, downmix). Produces one ExecutionGroup per parallel-safe batch of operations — the GroupingStrategy + CostEstimator components decide what runs together vs sequentially.

4. Build

Takes the OutputPlan and composes FFmpeg command-line arguments via FFmpegCommandBuilder + FilterGraphBuilder. This is where the encoder-resolver translates profile-level settings into encoder-native flags: Crf=22 becomes -crf 22 on libx264 but -rc vbr -cq 22 on NVENC, -global_quality 22 on QSV, -q:v 22 on VideoToolbox.

5. Execute + Finalize

Spawns FFmpeg via IFfmpegExecutor. Parses its -progress pipe:1 output in real time. On success, FinalizeStage stitches fragmented outputs, writes master playlists, applies chapter markers, fills metadata. On failure, writes a checkpoint for resume-from-here on the next attempt.

The strategy layer

A strategy decides which stages to run, in which order, and how to connect them. Strategies are keyed by (OutputFormat, EncodeMode):

Strategy Format Mode
HlsSinglePassStrategy HLS SinglePass
HlsTwoPassStrategy HLS TwoPass
DashSinglePassStrategy DASH SinglePass
DashTwoPassStrategy DASH TwoPass
Mp4SinglePassStrategy MP4 SinglePass
Mp4TwoPassStrategy MP4 TwoPass
MkvStrategy MKV SinglePass (always)
Mp3Strategy MP3 SinglePass
FlacStrategy FLAC SinglePass (lossless)
OggStrategy OGG SinglePass

Plugins can register their own strategies — the resolver is "last-registration-wins" so a plugin can replace the default HLS strategy with a custom variant.

Two-pass strategies

Two-pass only makes sense for software encoders. Hardware encoders (NVENC, AMF, QSV) ignore the stats file from pass 1 and get no quality benefit. The two-pass strategies handle:

  • Pass 1 video-only analysis, no audio → stats file
  • Checkpoint the stats file path so a crashed pass 2 can resume
  • Pass 2 full encode using the stats
  • Multi-variant ladders get per-variant stats files (variant 0 writes stats_v0.log, variant 1 writes stats_v1.log, etc.) — resume requires all variant stats present

Dependency injection

Everything is composed via AddNoMercyEncoder() in ServiceCollectionExtensions. One call registers:

  • Pipeline stages + orchestrator
  • All 10 strategies (format × encode-mode matrix)
  • CodecRegistry (video) + AudioCodecDefinitions (audio)
  • CodecResolver (vendor preference), EncoderArgumentResolver
  • IHardwareCapabilities + IFfmpegCapabilities
  • Content analysis: ICropDetector, IAudioFingerprinter, IIntroDetector, ISubtitleOcrEngine, IWhisperTranscriber
  • Output building blocks: IChapterWriter, IFontExtractor, ISubtitleExtractor, IAbrLadderGenerator
  • Distribution layer: registry, assigner, dispatcher, task serializer, source fetcher, progress sink, self-registration
  • Hosted services: HardwareBenchmarkHostedService (first-boot benchmark), HardwareInitializationService (ffmpeg probe), WorkerSelfRegistrationService (remote worker mode)

The EncoderOptions singleton configures all of it:

services.AddNoMercyEncoder(opts =>
{
    opts.FfmpegPathOverride = "/usr/bin/ffmpeg";
    opts.FfprobePathOverride = "/usr/bin/ffprobe";
    opts.TesseractModelsDirectory = "/var/lib/nomercy/tesseract";
    opts.WhisperModelPath = "/var/lib/nomercy/models/ggml-large-v3.bin";
    opts.DistributedEncodingSigningKey = "...";  // enables distribution
});

What makes this different from a raw ffmpeg wrapper

Encoder-family abstraction. Your profile says Codec = H264, Crf = 22, Preset = "medium". The resolver translates to the right flags for whatever encoder is actually available. Swap the server to a box with NVENC and the same profile emits -c:v h264_nvenc -rc vbr -cq 22 -preset p4 instead of -c:v libx264 -crf 22 -preset medium.

Pre-flight validation. Before ffmpeg even spawns, the profile runs through 30+ sanity checks. The common "encoded for 40 minutes, then ffmpeg errored because the level is incompatible with the resolution" story doesn't happen here.

Stateful checkpoints. Two-pass encodes survive mid-pass crashes. The stats file stays, the next attempt reads the checkpoint, skips pass 1, and runs pass 2 directly.

Observable decisions. Every output path through the planner is logged with rationale. The /preview API returns a human-readable action plan per stream ("your DTS 5.1 track will be transcoded to AAC because HLS can't carry DTS") before you kick off the encode.

Codecs and Formats

The encoder restricts outputs to a deliberate subset of well-supported formats. Input is unrestricted — anything ffprobe can parse is fair game.

Video codecs

Four codec families, 22 concrete encoders. The resolver picks the fastest encoder that can deliver your profile's settings on whatever hardware the server has.

H.264 (6 encoders)

Encoder Vendor Notes
libx264 Software The reference. 10 presets, 6 profiles including high10 for 10-bit.
h264_nvenc NVIDIA 7 presets p1..p7. 10-bit not supported on H.264 NVENC. 12 concurrent sessions per card.
h264_amf AMD 3 presets (speed/balanced/quality). Needs -usage transcoding or it picks ULL mode.
h264_qsv Intel 7 presets. Quality range is 1-51 (not 0-51). ICQ rate control available.
h264_vaapi Intel (Linux) No preset concept — driver only. 3 profiles.
h264_videotoolbox Apple Numeric profiles (66/77/100). Quality range 0-100.

Key point: every H.264 hardware encoder lacks 10-bit support. If your profile requests TenBit = true on H.264, the validator warns upfront and the plan stage downgrades to 8-bit with a log entry. HEVC is the way to go for 10-bit.

H.265 / HEVC (6 encoders)

Encoder Vendor Notes
libx265 Software main / main10 / main12 / main422-10 / main444-10. 10-bit + HDR.
hevc_nvenc NVIDIA main / main10 / rext. 10-bit + HDR.
hevc_amf AMD main / main10. 10-bit + HDR. Requires -usage transcoding.
hevc_qsv Intel main / main10 / mainsp / rext / scc. 10-bit + HDR.
hevc_vaapi Intel (Linux) main / main10. No presets.
hevc_videotoolbox Apple Profiles are numeric "1" / "2". 8-bit only. Requires -tag:v hvc1 for Apple client playback — encoder emits it automatically.

AV1 (7 encoders)

Encoder Vendor Quality range Notes
libsvtav1 Software 0-63 Fastest software AV1, 14 presets.
libaom-av1 Software 0-63 Reference encoder, very slow. Preset = cpu-used.
librav1e Software 0-255 Rust encoder. No CRF mode — CQP or VBR only.
av1_nvenc NVIDIA 0-51 12 concurrent sessions. 10-bit + HDR.
av1_amf AMD 0-255 4 presets. Note the QP range — profile CRF values get scaled automatically.
av1_qsv Intel 1-51 7 presets. 8-bit only.
av1_vaapi Intel (Linux) 0-255 No presets.

Apple Silicon decodes AV1 in hardware but cannot encode it. There's no av1_videotoolbox entry — the codec definition deliberately omits it so ForceHardware on a Mac doesn't silently pick a phantom encoder.

VP9 (3 encoders)

Encoder Vendor Notes
libvpx-vp9 Software No presets (uses -deadline / -cpu-used). Numeric profiles.
vp9_qsv Intel 7 presets. Quality 1-51. 8-bit only.
vp9_vaapi Intel (Linux) No presets. 0-255 QP.

VP9 hardware is Intel-only. No NVENC / AMF / VideoToolbox variants exist — VP9 hardware encoding never got broad vendor adoption. If you're on NVIDIA or AMD and pick VP9, you get software encoding.

CRF scaling across encoder families

Problem: libsvtav1 uses a 0-63 CRF scale. av1_amf uses a 0-255 QP scale. A profile that says Crf = 35 would mean "mid-quality" on libsvtav1 but "near-lossless" on av1_amf — same number, wildly different output.

Solution: the EncoderArgumentResolver scales profile-level CRF into the encoder's native range. Profiles are written in software-encoder units (0-51 for H.264/HEVC, 0-63 for AV1/VP9 software). The resolver scales to the target encoder's [Min, Max] automatically:

libsvtav1 Crf=35  →  av1_amf  -qp 142  (was -qp 35 pre-fix)
libsvtav1 Crf=35  →  av1_nvenc -cq 28  (was -cq 35 pre-fix)
libvpx    Crf=33  →  vp9_qsv  -global_quality 27
libx264   Crf=50  →  videotoolbox -q:v 98  (scales 0-51 → 0-100)

Linear proportional mapping — not perceptually perfect across encoders, but orders of magnitude closer than passing raw values.

Audio codecs (9)

Codec Encoder Channels Sample rates Bitrate range Default
AAC libfdk_aac 1, 2, 6, 8 44.1 / 48 / 96 kHz 32–512 kbps 192
Opus libopus 1, 2, 6, 8 48 kHz only 6–510 kbps 128
FLAC flac 1, 2, 6, 8 44.1 / 48 / 96 / 192 kHz lossless
AC3 ac3 1, 2, 6 only 48 kHz 32–640 kbps (stepped ladder) 384
EAC3 eac3 1, 2, 6, 8 48 kHz 32–6144 kbps 640
MP3 libmp3lame 1, 2 only 44.1 / 48 / 96 kHz 32–320 kbps 192
Vorbis libvorbis 1, 2, 6, 8 44.1 / 48 / 96 kHz 45–500 kbps 192
TrueHD truehd 1, 2, 6, 8 48 / 96 kHz lossless
DTS dca 1, 2, 6 only 48 kHz 32–1536 kbps 768

Gotchas the validator catches:

  • AC3 doesn't support 7.1 — DTS core same constraint
  • MP3 is stereo-only (surround AF on top of MP3 isn't a thing)
  • AAC uses libfdk_aac, not the built-in aac — requires nomercy-ffmpeg with --enable-nonfree
  • DTS encoder name is dca not dts (ffmpeg asymmetry)
  • AC3 silently rounds off-ladder bitrates down — 100 kbps becomes 96 with no warning from ffmpeg, but the validator flags it with "use 96 or 112"
  • Opus always resamples to 48 kHz internally — any other sample rate in the profile is rejected

Subtitle codecs (3 for extraction)

Text-based (handled directly):

  • WebVTT (webvtt / vtt)
  • ASS (ass / ssa)
  • SRT (srt / subrip)

Bitmap-based (OCR'd to text first):

  • PGS (hdmv_pgs_subtitle) — Bluray subs
  • DVD (dvd_subtitle) — DVD subs
  • DVB (dvb_subtitle) — broadcast subs

The SubtitleClassifier routes source streams to the correct extraction path. Covered more in Subtitles and DRM.

Output containers (7)

Container Format key Video codecs Audio codecs Subtitle modes
HLS Hls H.264, H.265, AV1 AAC, AC3, EAC3, Opus Extract (WebVTT)
DASH Dash H.264, H.265, AV1, VP9 AAC, Opus, AC3, EAC3 Extract (WebVTT / SRT)
MP4 Mp4 H.264, H.265, AV1 (+ VP9 with warning) AAC, AC3, EAC3, Opus Extract (WebVTT / SRT)
MKV Mkv Everything Everything Copy (all codecs)
MP3 Mp3 — (audio-only) MP3 only
FLAC Flac — (audio-only) FLAC only
OGG Ogg — (audio-only) Vorbis, Opus, FLAC

The audio-only containers (MP3, FLAC, OGG) reject video + subtitle outputs at the profile-validation layer.

Strategy matrix

Ten strategies cover the container × encode-mode combinations:

Single-pass Two-pass
HLS HlsSinglePassStrategy HlsTwoPassStrategy (per-variant stats)
DASH DashSinglePassStrategy DashTwoPassStrategy
MP4 Mp4SinglePassStrategy Mp4TwoPassStrategy
MKV MkvStrategy — (not meaningful on lossless-capable container)
MP3 Mp3Strategy
FLAC FlacStrategy — (lossless, pass concept N/A)
OGG OggStrategy

All strategies compose the same building blocks (IChapterWriter, IFontExtractor, ISubtitleExtractor, IAbrLadderGenerator, IPlaylistGenerator) — format-specific logic is isolated to the strategy itself.

Profiles and Presets

An encoding profile is the declarative description of what you want ffmpeg to produce. The encoder reads a profile + a source file and builds the command-line automatically. Users rarely touch ffmpeg flags directly.

Profile structure

{
  "Name": "General — 1080p",
  "Format": "Hls",
  "SegmentDurationSeconds": 6,
  "AutoLadder": false,
  "AutoDetectCrop": false,
  "EncodeMode": "SinglePass",
  "VideoOutputs": [
    {
      "Codec": "H264",
      "Width": 1920,
      "Height": 1080,
      "BitrateKbps": 0,
      "Crf": 23,
      "Preset": "medium",
      "Profile": "high",
      "Level": "4.1",
      "KeyframeIntervalSeconds": 2,
      "TenBit": false,
      "ConvertHdrToSdr": false
    }
  ],
  "AudioOutputs": [
    {
      "Codec": "Aac",
      "BitrateKbps": 192,
      "Channels": 2,
      "SampleRateHz": 48000,
      "AllowedLanguages": ["en", "jp"],
      "Loudness": "None",
      "Downmix": "Auto"
    }
  ],
  "SubtitleOutputs": [
    {
      "Codec": "WebVtt",
      "Mode": "Extract",
      "AllowedLanguages": []
    }
  ]
}

VideoOutput fields

Field Purpose
Codec H264 / H265 / Av1 / Vp9
Width, Height Target dimensions. Height optional — null means "preserve aspect ratio."
BitrateKbps Target bitrate (CBR/VBR mode). Set to 0 when using CRF.
Crf Constant Rate Factor (CRF mode). Set to 0 when using bitrate.
Preset Encoder preset — medium, slow, p4, etc. Family-aware.
Profile Codec profile — high, main10, profile0, etc.
Level Codec level — 4.1, 5.1, etc. Optional; the encoder picks if unset.
KeyframeIntervalSeconds GOP length in seconds.
TenBit 10-bit pixel format.
ConvertHdrToSdr Tonemap HDR source → SDR output.
Tune Codec tune (libx264: animation / film / grain / etc.)
CustomArguments Power-user escape hatch for flags not covered by the schema. Validator blocks the ones that collide with resolver-managed flags.

CRF vs Bitrate are mutually exclusive. Set one, leave the other at 0. The validator warns if both are set and tells you which one wins (CRF).

AudioOutput fields

Field Purpose
Codec One of 9 audio codecs.
BitrateKbps Target bitrate. Ignored for lossless codecs (FLAC, TrueHD).
Channels 1 / 2 / 6 / 8 (codec-dependent).
SampleRateHz 44100 / 48000 / 96000 / 192000 (codec-dependent).
AllowedLanguages ISO 639 language tag filter. Empty = accept all.
Loudness None / EbuR128 / ReplayGain / Custom — loudness normalization.
Downmix Auto / StereoItuR128 / Mono / Custom — explicit pan matrix for downmix.
CustomPanMatrix When Downmix = Custom, the raw ffmpeg pan expression.

SubtitleOutput fields

Field Purpose
Codec WebVtt / Ass / Srt.
Mode Extract (sidecar file) / BurnIn (burned into video) / PassThrough (copy into container).
AllowedLanguages Language filter, same shape as audio.

Built-in preset library

10 presets ship out of the box, covering the common hardware targets. All pass the validator with no errors. Copy any of them as a starting point for your own presets.

General — 1080p Fast

H.264 1080p @ medium preset, CRF 23. AAC 192 kbps stereo in HLS. Balanced streaming for desktop + most mobile clients.

Web — 720p

H.264 720p @ fast preset, CRF 24. AAC 192 kbps. Small-file HLS for slow connections.

Archival — H.265 1080p

HEVC 1080p @ slow preset, CRF 20. AAC 192 kbps. Smaller files at higher quality than H.264 equivalent; slower encode.

Anime — 1080p

libx264 tune=animation — tweaks psy-rd + keyframe settings for flat-colour content. CRF 22, slow preset.

Music — AAC 192k

Audio-only MP4/M4A. AAC 192 kbps stereo. For music library encoding.

Chromecast — 1080p

H.264 High Profile Level 4.1 1080p + AAC stereo in HLS. Caps at L4.1 because older Chromecasts + smart TVs reject higher levels.

Apple TV 4K — HDR

HEVC Main10 2160p Level 5.1 + EAC3 5.1 in HLS. Preserves HDR passthrough when source is HDR (no tonemap, 10-bit output, Dolby Vision RPU survives when source has it).

Mobile — 480p Low Bandwidth

H.264 Main L3.1 @ 900 kbps + AAC 96 kbps stereo. Bitrate-capped (not CRF) for predictable file size. 3G / bad-wifi friendly.

4K Archival — HEVC + FLAC

HEVC Main10 2160p CRF 18 (visually lossless) + FLAC 5.1 in MKV. Massive files — for long-term archival of source-quality material, not streaming.

Anime — HEVC 10-bit 1080p

HEVC 10-bit preserves color banding on flat-shaded content much better than H.264 8-bit. Opus 5.1 + MKV (keeps ASS subtitle typesetting intact — HLS would convert to WebVTT and lose styling).

Preset inheritance

Presets can chain: a preset named "Studio 4K Archive" can have ParentPresetId pointing at "4K Archival — HEVC + FLAC" and override just the Name + Tags. The PresetResolver walks the parent chain at resolve time, merging child fields on top of the parent.

Cycle-safe — a preset chain that loops back to itself gets caught and rejected with a clear error.

Auto-ladder

When AutoLadder = true, the single reference VideoOutput in the profile gets expanded into a full ABR ladder based on source resolution. The AbrLadderGenerator:

  • Skips tiers above source height (no upscaling)
  • Scales bitrate per tier by source complexity (thin sources get thinner tiers — anime at 2 Mbps source shouldn't produce a 720p variant at 3 Mbps)
  • Generates tiers at 360 / 480 / 720 / 1080 / 1440 / 2160 depending on source

Standard Apple HLS authoring bitrates:

Height Default bitrate
360p 800 kbps
480p 1400 kbps
720p 3000 kbps
1080p 6000 kbps
1440p 9000 kbps
2160p 15000 kbps

Profile import / export

Users can export any preset as a portable JSON file:

GET /api/v1/dashboard/encoding/presets/{id}/export

Returns a PresetExport record with Name, Description, Author, Tags, ProfileJson. Paste this into a file, share it, the recipient POSTs to /import and gets the preset in their own library.

Two convenience import paths:

POST /api/v1/dashboard/encoding/presets/import       ← paste JSON body
POST /api/v1/dashboard/encoding/presets/import-url   ← pull from HTTPS URL

Import-URL only accepts https:// schemes. HTTP is rejected because MITM attacks on preset imports are trivial otherwise — and since presets execute in the encoder (via the command-line they generate), silently injecting hostile flags is a real risk.

Cloning built-ins

Built-in presets are marked IsBuiltIn = true and can't be edited or deleted directly. That protects the seed from accidental damage. To tweak one, clone it:

POST /api/v1/dashboard/encoding/presets/{id}/clone

Creates a copy with IsBuiltIn = false and an auto-suffixed name ("General — 1080p (copy)" or "(copy 2)" if that's already taken). The clone is fully editable.

The Safety Net

Most encoder platforms fail at encode time with cryptic ffmpeg errors after minutes of wasted work. The NoMercy encoder tries to fail at profile-save time with actionable fix guidance.

This page covers the 30+ validation checks that run before any ffmpeg process spawns, and the two API endpoints that expose them to the UI.

Philosophy

  • Errors block the profile. The encode can't run with this setting.
  • Warnings let the profile through but flag configurations that will silently produce wrong output.
  • Messages always include fix guidance. Not "CRF 0-51 required" but "CRF 70 is outside the valid range [0, 51] for H264."

Structural validation (always runs)

Check Severity Message pattern
Profile has a name Error "Profile must have a non-empty name"
At least one video or audio output Error "Profile must have at least one video or audio output"
Video width > 0 Error "Width must be greater than 0"
Either Bitrate > 0 or CRF > 0 Error "Video output must specify either BitrateKbps > 0 or Crf > 0"
CRF within codec's software range Error "Crf value 70 is outside the valid range [0, 51] for H264"
KeyframeIntervalSeconds >= 0 Error

Codec vs container compatibility

HLS is the strictest. DASH allows a superset. MP4 permits nonstandard combinations with warnings. MKV allows everything. Audio-only containers require matching audio codec.

Container Video allowed Audio allowed
HLS H264, H265, AV1 AAC, AC3, EAC3, Opus
DASH H264, H265, AV1, VP9 AAC, Opus, AC3, EAC3
MP4 H264, H265, AV1 (VP9 warns as non-standard) all allowed
MKV everything everything
MP3 — (video forbidden) MP3 only
FLAC — (video forbidden) FLAC only
OGG — (video forbidden) Vorbis, Opus, FLAC

Each violation produces an error naming the exact field and what's allowed.

Audio codec validation

Per codec, the validator checks:

  • Channels in supported set. AC3 rejects 7.1, MP3 rejects anything above stereo, DTS core rejects 7.1.
  • Sample rate in supported set. Opus rejects ≠48 kHz, AC3/EAC3/DTS require 48 kHz.
  • Bitrate in range. AAC 32–512, AC3 32–640, EAC3 32–6144, etc.
  • Lossless codecs skip bitrate check. FLAC / TrueHD use 0 and the check is gated by IsLossless.

Video quality-range traps

Level vs dimensions

H.264 / H.265 / VP9 declare a max frame size per level. A 4K source at H.264 Level 4.1 plays on modern desktop browsers but hardware decoders (iOS, smart TVs, set-top boxes) reject the stream because L4.1 caps at 1080p.

The validator loads a level-table per codec and rejects mismatches:

Error: H264 Level 4.1 caps at 1080p (1920x1080) but the output is 
3840x2160 (8,294,400 pixels). Hardware decoders (iOS, set-top boxes, 
TVs) reject streams above the declared level. Raise the level or 
drop the resolution.

VP9 profile vs bit depth

VP9's bit depth is locked to its profile number:

  • profile0 and profile1 → 8-bit only
  • profile2 and profile3 → 10 / 12-bit

Setting TenBit = true with profile0 produces either a silent 8-bit downgrade or a malformed stream. The validator rejects with "Use profile2 or profile3 for 10-bit VP9."

10-bit H.264 warning

Every H.264 hardware encoder lacks 10-bit support. The validator doesn't error (libx264 can deliver it) but warns:

Warning: 10-bit H264 has no hardware encoder support (every vendor's 
H.264 hardware path is 8-bit for this codec). Output will be 
software-encoded or silently downgraded to 8-bit.

CRF quality-cliff

CRF 35+ on H.264/H.265 produces blocky, smeared output. CRF 45+ on VP9, 50+ on AV1 is the equivalent zone on their wider scales. The validator warns so typos like "53" (user meant "23") catch it before encode time.

Bitrate-per-resolution floor

4K H.264 at 2 Mbps is unwatchable regardless of preset. The validator knows codec-aware floors per resolution:

H.264 HEVC / AV1 / VP9
1080p 2500 kbps 1500 kbps
1440p 7500 kbps 5000 kbps
2160p 12000 kbps 8000 kbps

Only fires when bitrate rate-control is active (BitrateKbps > 0). CRF mode is exempt — no bitrate target, nothing to floor against.

Audio quality warnings

AC3 stepped bitrate ladder

AC3 accepts only 19 specific bitrate values. ffmpeg silently rounds off-ladder values down — user asks for 100 kbps, gets 96, with no warning from ffmpeg. The validator catches:

Warning: Ac3 requires an exact bitrate from its ladder. 100 kbps will 
be silently rounded down to 96 kbps by the encoder. Use 96 or 112 
kbps to get the bitrate you asked for.

Channel-aware quality floors

  • AC3 / EAC3 surround below 384 kbps — Dolby's own recommended floor for 5.1. Below this produces audibly compressed audio.
  • AAC stereo above 256 kbps — AAC plateaus around 192-256 for stereo. Higher is wasted bandwidth.
  • AAC 5.1/7.1 below 256 kbps — under-provisioned, dialog suffers.

HLS/DASH segment-keyframe alignment

HLS and DASH segments should start on a keyframe. When segment duration isn't an integer multiple of keyframe interval, the encoder inserts extra keyframes (raising bitrate) or segments drift in length (players re-buffer on every drift).

Warning: KeyframeIntervalSeconds (2s) does not evenly divide 
SegmentDurationSeconds (5s). Use a key interval that divides the 
segment duration (e.g. 2s key / 6s segment).

Cross-output checks

Duplicate variants

Two video outputs with the same codec, resolution, bitrate, CRF, and bit depth → encoder runs the same encode twice, produces duplicate files. Warning with "remove one."

Inverted ABR ladder

1080p variant at 2 Mbps underneath 720p variant at 5 Mbps means ABR-capable players have no reason to switch up. Warning.

Streaming profile with no audio

HLS / DASH with video but zero audio outputs breaks iOS and many smart-TV clients. Legit for silent video (CCTV, demo loops) — warn, don't error.

Audio language-filter overlap

Two audio outputs with the same codec + channel count + overlapping language filters → both encode the same source streams. Usually means the user forgot to set one of the filters. Warn with "narrow one or remove the duplicate."

CustomArguments reserved-flag guards

CustomArguments is the power-user escape hatch — {"-metadata": "author=me"} etc. But certain flags are resolver-managed. Letting a user override them silently clobbers the encoder's choice:

  • Codec flags: -c:v, -c:a, -c:s, -vcodec, -acodec
  • Preset / profile / level / quality: -preset, -profile:v, -level, -crf, -cq, -qp, -global_quality, -q:v
  • Rate-control: -rc, -b:v, -b:a, -maxrate, -minrate, -bufsize
  • Pixel / dimensions: -s, -pix_fmt
  • Filter chains: -vf, -map, -f
  • Tag / codec-specific: -tag:v
  • HLS / DASH: -hls_time, -hls_segment_filename, -hls_playlist_type, -seg_duration
  • Hardware init: -init_hw_device, -filter_hw_device, -gpu, -hwaccel

Using any of these in CustomArguments on video / audio / subtitle outputs raises an error with "collides with a flag the encoder pipeline controls." Safe flags like -metadata pass through.

POST /api/v1/dashboard/encoding/presets/validate

The dashboard UI calls this on every profile keystroke:

POST /validate
{
  "profile_json": "{ ... }"
}

Response:
{
  "valid": true,
  "errors": [],
  "warnings": [
    {
      "field": "VideoOutput[0].TenBit",
      "message": "10-bit H264 has no hardware encoder support..."
    }
  ]
}

Field paths let the UI highlight the exact bad field. Messages carry fix guidance inline.

POST /api/v1/dashboard/encoding/presets/preview

Once a source file is selected, /preview tells you what will actually happen per stream:

POST /preview
{
  "profile_json": "{ ... }",
  "video_file_id": "<ULID>"
}

Response:
{
  "source": {
    "path": "...",
    "duration_seconds": 5400,
    "video_streams": 1,
    "audio_streams": 2,
    "subtitle_streams": 3,
    "is_hdr": true,
    "has_dolby_vision": true,
    "is_variable_frame_rate": false
  },
  "video_plan": [{
    "source_codec": "hevc",
    "source_resolution": "3840x2160",
    "target_codec": "H265",
    "target_resolution": "1920x1080",
    "action": "Transcode",
    "rationale": "Transcoding: resolution 3840px → 1920px."
  }],
  "audio_plan": [{
    "source_codec": "dts",
    "source_channels": 6,
    "target_codec": "Aac",
    "target_channels": 2,
    "action": "Transcode",
    "rationale": "dts is not supported in HLS — transcoding to Aac."
  }],
  "subtitle_plan": [...],
  "source_warnings": [
    { "field": "source.hdr", "severity": "info",
      "message": "Source is HDR10/HLG — will be preserved..." }
  ],
  "warnings": [...],
  "errors": []
}

Users see before clicking Encode: DTS getting transcoded, PGS subs getting OCR'd, Dolby Vision getting stripped, HDR getting tonemapped to SDR. No surprises.

Hardware and HDR

The encoder detects, measures, and exploits the hardware encoders available on the host. It also handles HDR sources — passing through when possible, tonemapping to SDR otherwise.

Hardware detection

IHardwareDetector enumerates GPUs at startup. On first boot (or when drivers change) the HardwareBenchmarkHostedService runs a calibration pass.

What gets detected

Vendor Detection method Encoder families
NVIDIA nvidia-smi + NVML NVENC (H.264 / HEVC / AV1)
AMD rocm-smi on Linux, DXGI on Windows AMF (H.264 / HEVC / AV1)
Intel intel-gpu-top / Media SDK probe QSV + VAAPI (all codecs)
Apple Metal device query VideoToolbox (H.264 / HEVC)

Per GPU: name, VRAM, max concurrent encoder sessions, which video codecs the card can encode. The detector uses this to filter encoders and feed the benchmark.

Per-GPU calibration

The benchmark runs once per (encoder × GPU × resolution tier). A host with two NVIDIA cards benchmarks h264_nvenc on both separately — the speed index distinguishes them by device name.

Tier selection

Standard tiers for every target:

  • 1920×1080
  • 1280×720
  • 854×480

High-VRAM GPUs (≥6 GB) also run:

  • 3840×2160 (4K)

Software targets and low-VRAM GPUs skip 4K — calibration at 4K on a slow software encoder eats tens of seconds per probe, and low-VRAM cards fall back to tiled encoding paths that don't represent real-world throughput.

Calibration source

Lavfi synthetic source — no real media file needed:

ffmpeg -init_hw_device cuda=cu:0 -filter_hw_device cu \
       -f lavfi -i testsrc=duration=1:size=1920x1080:rate=30 \
       -vf format=nv12,hwupload_cuda \
       -c:v h264_nvenc -gpu 0 -preset p4 \
       -frames:v 30 -f null -

Key details:

  • -init_hw_device cuda=cu:N initialises CUDA device N so NVENC lands on the correct card on multi-GPU boxes
  • -hwupload_cuda uploads lavfi frames to GPU so measurement excludes CPU→GPU transfer
  • -frames:v 30 caps output at 30 frames — libaom-av1 takes 8+ minutes per tier at full 5s duration; 30 frames is enough for a stable fps reading
  • -f null - discards output — no disk write
  • stderr is logged on probe failure so "CUDA init failed" errors surface (they used to go silent)

Vendor-specific init

Vendor Init args Encoder selector
NVIDIA -init_hw_device cuda=cu:N -filter_hw_device cu -gpu N
Intel QSV -init_hw_device qsv=hw -filter_hw_device hw (hwupload with extra_hw_frames=16)
AMD / Apple Auto-init (no extra args)
Software No hardware args

Results

Calibration produces a SpeedIndex keyed by (codec, encoder, resolution, device name):

{
  "H264:h264_nvenc:1920:RTX 4080": { "fps": 280, "speed": 9.3 },
  "H264:h264_nvenc:1920:RTX 3060": { "fps": 180, "speed": 6.0 },
  "H264:libx264:1920:null": { "fps": 55, "speed": 1.8 }
}

Persisted via ISpeedIndexStore (default: JSON file under the cache dir). Recalibrates after 30 days or when the user explicitly asks via the dashboard API.

The dispatcher and ABR ladder generator both use the speed index to prioritize the faster encoder and distribute work proportionally to measured throughput.

Encoder selection

When a profile says "H264", the CodecResolver picks the fastest available encoder based on preference:

Preference Behavior
PreferHardware (default) First available HW encoder, else software
PreferSpeed Same as PreferHardware
ForceSoftware libx264 regardless of hardware presence
ForceHardware HW encoder or throw
PreferQuality Software (quality still beats HW at matched bitrate)

On a multi-GPU box the resolver picks the first matching GPU — the dispatcher later distributes work across all cards.

HDR handling

Three HDR paths depending on source + profile:

1. HDR passthrough

Source is HDR10 / HLG / Dolby Vision. Profile targets HEVC 10-bit, no tonemap. The encoder preserves color metadata:

-color_primaries bt2020 -color_trc smpte2084 
-colorspace bt2020nc -color_range tv

For Dolby Vision sources, the output also needs -tag:v dvh1 in MP4 / HLS so Apple clients route through the DV decoder. The planner sets this automatically when:

  • Source has Dolby Vision side-data
  • Profile is HEVC 10-bit, no tonemap
  • Container is HLS, MP4, or MKV (MKV preserves DV without the tag)

2. HDR → SDR tonemap

Source is HDR, profile has ConvertHdrToSdr = true OR profile is 8-bit output. The TonemapSelector picks the fastest available tonemap method:

Method Filter chain GPU-accelerated Requires
libplacebo libplacebo=tonemapping=hable:... Vulkan libplacebo filter in ffmpeg
tonemap_opencl tonemap_opencl=tonemap=hable:desat=0:format=nv12 OpenCL-capable GPU
zscale+tonemap CPU path via zscale always works

Priority: libplacebotonemap_openclzscale. CPU tonemap works on any host but is 5-10× slower than the GPU paths.

Algorithm defaults to Hable — preserves dark detail better than Reinhard on most real-world content. Configurable via HdrOptions:

new HdrOptions(
    Algorithm: TonemapAlgorithm.Hable,  // or Reinhard / Mobius / Bt2390
    CustomLutPath: null,                 // path to 3D LUT .cube file
    LutApply: LutApplication.AfterTonemap,
    Desat: 0.0,
    Peak: 0.0,
    PreserveMetadata: false
)

3. SDR → HDR (not supported)

Inverse tonemapping produces artifacts that look worse than the source. The encoder never attempts it.

10-bit downgrade guard

When a profile requests TenBit = true but the resolved encoder has Supports10Bit = false, the plan stage downgrades to 8-bit with a warning:

Warning: Profile requests 10-bit video_0 but encoder h264_nvenc does 
not support 10-bit. Downgrading to 8-bit output.

Without this guard the output would emit an empty pixel-format string and ffmpeg would either pick the source's format (wrong on 10-bit sources) or fail at runtime with "Invalid pixel format."

Canonical cases:

  • Every H.264 hardware encoder is 8-bit
  • hevc_videotoolbox is 8-bit (videotoolbox only supports 10-bit HEVC via a separate code path not exposed through ffmpeg)

HEVC on NVENC / AMF / QSV / VAAPI all support 10-bit natively — the downgrade never fires on HEVC except on Apple hardware.

Dolby Vision

The MediaAnalyzer parses Dolby Vision side-data from ffprobe:

{
  "profile": 8,
  "level": 6,
  "has_rpu": true,
  "has_el": false,
  "bl_compat": "Hdr10"
}
  • profile: 5 (single-layer) / 7 (dual-layer) / 8 (hybrid, HDR10 compatible) / etc.
  • bl_compat: what the Base Layer degrades to if the client can't decode DV — Hdr10, Sdr, or None

Planner's DV preservation logic:

  • Source has DV metadata (RPU present)
  • Output is HEVC 10-bit
  • Output format is HLS / MP4 / MKV
  • ConvertHdrToSdr is false

All four must be true. Any violation strips the RPU and the plan stage logs a warning so the user knows DV won't survive.

HDR10+

Detected from ffprobe side-data. Currently treated like HDR10 — the dynamic per-scene metadata survives in HEVC passthrough encodes but the encoder doesn't inject HDR10+ metadata when the source doesn't have it.

System resource monitoring

ProcessResourceMonitor tracks live ffmpeg processes for CPU / RAM / GPU utilization. The dispatcher reads these to avoid kicking off new tasks when an existing encode is saturating the box.

On the dashboard:

GET /api/v1/dashboard/workers   → per-worker capacity snapshots
GET /api/v1/dashboard/workers/tasks/progress → live task progress

Capacity is GPU slots available + CPU threads available per worker. A worker at 0 available slots gets a fallback weight of 1 so it still receives work (preferred: the assigner would rather overload one box than strand tasks).

Content Analysis

The encoder doesn't just encode — it can look at a source and derive information that improves the encode or the playback experience: crop detection, intro / outro markers, OCR'd subtitles, speech transcription.

Each analyzer runs as a standalone building block. You can invoke them from the dashboard for spot-checking, or let the subscribers run them automatically when a file is ingested.

Crop detection

Problem: letterboxed sources waste encoding budget on black bars and produce awkward aspect ratios on non-16:9 displays.

Solution: the ICropDetector runs ffmpeg's cropdetect filter against a sample of frames and returns a CropResult:

{
  "should_crop": true,
  "width": 1920,
  "height": 804,
  "x": 0,
  "y": 138
}

should_crop is false when the detected rectangle matches the full frame (nothing to crop). The planner feeds the result into the filter chain as crop=1920:804:0:138 so ffmpeg crops before scaling.

Spot-check endpoint:

GET /api/v1/dashboard/content-analysis/crop/{videoFileId}

Owner-only (crop detection is ffmpeg-bound and can take 30–60s on large sources — DoS protection).

Auto-detection per profile: set AutoDetectCrop = true and the planner runs crop detection on the source during the Plan stage. The result gets applied to all video variants in the ladder.

Intro / outro fingerprinting

Problem: TV shows with recurring intros and outros waste viewer time on replays. Players want to offer "skip intro / outro" buttons, but they need to know where the markers are.

Solution: IAudioFingerprinter + IIntroDetector work together.

Fingerprinting with Chromaprint

ffmpeg ships with the Chromaprint muxer — the same fingerprint used by AcoustID for music identification. The fingerprint is a compact hash of perceptual audio features, stable under re-encoding and minor volume shifts.

AudioFingerprint fp = await fingerprinter.FingerprintAsync(
    path,
    new FingerprintWindow(
        TimeSpan.Zero,
        TimeSpan.FromMinutes(3)
    ),
    ct
);

A FingerprintWindow constrains which part of the file gets fingerprinted — you don't need a fingerprint of the whole episode to detect the intro.

Intro detection across episodes

IIntroDetector compares fingerprints from multiple episodes of the same season and finds shared windows:

List<AudioFingerprint> intros = [/* first 3 min per episode */];
IntroMarker? intro = introDetector.DetectIntro(intros);

Returns an IntroMarker with start / end timestamps and a confidence score (0.0–1.0). The detector uses a sliding-window Hamming-distance match — so shifted-offset intros (an episode that starts its intro at 0:15 while another starts at 0:22) still cluster as the same intro.

Outro detection is the mirror: fingerprint the last 3 minutes of each episode, look for the shared tail.

Storage

Detection results go into the ContentSegments table:

Column Type Example
Id ULID 01JGXZ...
EpisodeId / MovieId int (one of) 62085
SegmentType enum Intro, Outro, Recap, Credits
StartSeconds double 45.2
EndSeconds double 120.8
Source string chromaprint, manual, tmdb
Confidence double 0.87

Editing a segment via the /content-segments/{id} API flips Source to "manual" so the next detector run doesn't clobber the user's correction.

Manual + mixed workflows

Dashboard-facing API:

GET  /api/v1/content-segments/episode/{id}
GET  /api/v1/content-segments/movie/{id}
POST /api/v1/content-segments              (create)
PUT  /api/v1/content-segments/{id}         (edit — flips Source to "manual")
DELETE /api/v1/content-segments/{id}

Players consume the episode / movie endpoints to offer skip buttons.

Spot-check endpoint for testing the detector on a specific season:

POST /api/v1/dashboard/content-analysis/intro/{seasonId}

Runs the full detection pipeline and returns the detected markers without writing to the database. Owner-only — fingerprinting every episode is minutes of ffmpeg work per episode.

Subtitle OCR

Problem: Bluray sources ship with PGS (bitmap) subtitles. Apple and most smart TVs can render them, but web players can't — WebVTT is the common format. DASH / HLS can only carry text subs in their sidecar playlists.

Solution: ISubtitleOcrEngine runs Tesseract against rendered subtitle frames via ffmpeg's ocr filter.

SubtitleTrack track = await ocrEngine.OcrAsync(
    path,
    streamIndex: 3,
    language: "eng",
    outputCodec: SubtitleCodecType.WebVtt,
    ct
);

Language models

Tesseract needs .traineddata model files per language. The encoder downloads missing models on demand:

GET /api/v1/dashboard/content-analysis/ocr-models    (list + status)
POST /api/v1/dashboard/content-analysis/ocr-models/{lang}/download

Models live under EncoderOptions.TesseractModelsDirectory. Default language pack: eng. Add more as needed; the model manager fetches from the official Tesseract repo.

Supported bitmap formats

Via SubtitleClassifier:

  • hdmv_pgs_subtitle (Bluray)
  • dvd_subtitle (DVD)
  • dvb_subtitle (broadcast)

Spot-check endpoint

POST /api/v1/dashboard/content-analysis/ocr/{videoFileId}
       ?streamIndex=3&language=eng

Useful before enabling OCR on a library-wide re-encode — run the dashboard command on one file first, look at the output WebVTT, tune the language if needed.

Whisper speech transcription

Problem: some sources ship without subtitles at all. Manual transcription is hours of work; cloud APIs require uploading your library.

Solution: IWhisperTranscriber runs whisper.cpp locally against the first audio stream and produces WebVTT.

WhisperOptions opts = new(
    ModelPath: "/var/lib/nomercy/models/ggml-large-v3.bin",
    ModelSize: WhisperModelSize.LargeV3,
    TranslateToEnglish: false
);
SubtitleTrack track = await whisperTranscriber.TranscribeAsync(
    path, audioStreamIndex: 0,
    language: "jpn", options: opts,
    progress: null, ct
);

Model sizes

Model Size on disk Speed Quality
Tiny 75 MB Very fast Word-recognition only
Base 150 MB Fast Okay for clear speech
Small 500 MB Medium Good for most content
Medium 1.5 GB Slow Near-human accuracy
LargeV3 3 GB Very slow Best available

LargeV3 is the recommended default — anything smaller noticeably misses specialised vocabulary (show-specific names, technical terms).

Translate vs transcribe

TranslateToEnglish: true transcribes in the source language AND translates to English in one pass. Useful for anime — produces English WebVTT from Japanese audio without a separate translation step.

Progress

Whisper is multi-minute work even on a good GPU. The IProgress<double> parameter reports fraction complete so the dashboard can render a progress bar.

Spot-check

POST /api/v1/dashboard/content-analysis/transcribe/{videoFileId}
        ?language=eng&translateToEnglish=false

Owner-only. Writes the WebVTT next to the source file.

Analysis subscribers

IntroDetectionSubscriber listens on the event bus for LibraryScanned / EpisodeAdded events and runs intro detection automatically for newly-scanned seasons. Cross-episode detection kicks in once a season has ≥2 encoded episodes on disk.

Other subscribers can hook the same events — the event bus supports plugin-contributed subscribers. The encoder ships with:

  • AutoEncodeSubscriber — watch folder, start encode when new files land in a directory with an assigned EncoderProfileFolder
  • IntroDetectionSubscriber — auto-detect intros when a season has ≥2 episodes with video files
  • EncodingNotificationSubscriber — fire configured webhook URLs on encode start / complete / fail

Combining analysis with playback

The content-analysis results end up in the media database + player UI:

  • Crop result → profile's video filter chain
  • Intro / outro markers → ContentSegments rows → player shows "skip intro" buttons at the right timestamps
  • OCR subtitles → WebVTT sidecars alongside the video variants
  • Whisper transcription → same WebVTT pipeline, just a different extraction source

Every result is editable via the dashboard — automated detection is a starting point, not the final word.

Subtitles and DRM

Two related areas: getting subtitle text out of a source in the right format for the output container, and optionally encrypting the output so only authorised clients can decode it.

Subtitle routing

Source subtitles fall into two categories: text-based (ASS, SRT, WebVTT, mov_text) and bitmap-based (PGS, DVD, DVB). The output container determines which routing path applies.

The decision matrix

Source codec Output format Action Rationale
text MKV Copy MKV carries text codecs natively, stream-copy is lossless
text HLS Extract (WebVTT) HLS only carries WebVTT in its playlist
text MP4 Extract (WebVTT / SRT) MP4 sidecar subtitle files
text DASH Extract (WebVTT) DASH manifests reference sidecar
bitmap MKV Copy MKV is the only container that carries bitmap subs natively
bitmap any other Transcode Needs OCR to text, or burn-in

Plus the user can explicitly pick a SubtitleMode:

  • Extract — write WebVTT / SRT / ASS next to the video, reference from the playlist
  • BurnIn — render subtitles into the video frames (filter chain). Permanent — no toggle at playback time.
  • PassThrough — copy the subtitle stream verbatim into the container (only makes sense for MKV + DASH)

BurnIn specifics

BurnIn triggers a video filter chain. ASS burn-in needs libass (statically linked into nomercy-ffmpeg). PGS burn-in uses the overlay filter on rendered subtitle frames.

Caveats the validator catches:

  • BurnIn is permanent — warning surfaces so users know the output track has no toggleable subtitles
  • Per-variant in an HLS ladder, burn-in applies to every variant that's tagged for the same language. Multi-language burn-in would need separate variant ladders per language (expensive)

HLS WebVTT pipeline

When extracting for HLS, the encoder:

  1. Reads the source subtitle stream
  2. For text: converts to WebVTT via ffmpeg subtitle filter
  3. For bitmap: runs OCR (see Content analysis) to produce text, then converts to WebVTT
  4. Writes WebVTT files alongside the video variants
  5. References them in the master playlist with:
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",
  NAME="English",LANGUAGE="en",DEFAULT=YES,
  URI="subtitles/movie.en.webvtt.m3u8"

The sidecar playlist is itself a WebVTT-playlist (a tiny m3u8 referencing a single .vtt file per variant).

Preserving styling

ASS carries rich typesetting (positions, colours, fade effects). WebVTT has a much smaller subset. When extracting ASS → WebVTT for HLS, styling is lost — the validator warns so users who care about typesetting know to pick MKV output instead.

For anime in particular, MKV is the recommended output because ASS typesetting is central to the viewing experience and HLS can't carry it.

Attached fonts

MKV sources often ship with attached font files (.ttf / .otf) so renderers can match the original typesetting. IFontExtractor pulls them out and writes them to the output directory alongside the subtitle files. Playback in supported clients picks them up automatically.

Chapter writing

Sources with chapter metadata (Bluray rips, anime with opening / ending markers, documentaries with section breaks) get their chapters preserved in the output:

  • MKV: chapters carry over via stream-copy
  • MP4: written as chpl or nmhd atom via IChapterWriter
  • HLS: emitted as #EXT-X-DATERANGE tags with chapter identifiers

Chapters are separate from content-analysis intro/outro markers — the former is source metadata (authored by the content creator), the latter is derived by the audio fingerprinter.

AES-128 HLS encryption

For paid-tier content or commercial distribution, HLS supports segment-level AES-128 encryption. The Aes128HlsDrmProcessor handles key generation, IV management, and ffmpeg -hls_key_info_file integration.

How it works

  1. Before the encode starts, the processor generates a random 128-bit key + IV per profile (or reuses an existing one via DrmConfig)
  2. Writes a key_info file with:
    • Key URL that players will fetch
    • Local path to the binary key file
    • IV as a hex string
  3. Passes the key_info file to ffmpeg via -hls_key_info_file
  4. ffmpeg writes each .ts segment encrypted with AES-128-CBC
  5. The playlist includes #EXT-X-KEY:METHOD=AES-128,URI="...",IV=0x..." pointing at the key URL
  6. Players fetch the key (over HTTPS, ideally with auth) and decrypt segments on the fly

DrmConfig

{
  "Drm": {
    "Scheme": "Aes128Hls",
    "KeyUri": "https://your-server/api/v1/drm/key/{jobId}",
    "KeyFilePath": "/var/nomercy/drm/keys/{jobId}.key",
    "IvHex": "0123456789abcdef0123456789abcdef"
  }
}

KeyUri is what ends up in the playlist. The server can gate access behind any auth layer — AES-128 HLS doesn't care about key distribution, it just expects the URL to return the key bytes.

IvHex is optional; when null the processor generates a random IV per encode.

Security model

  • Not DRM in the Widevine / PlayReady sense. AES-128 HLS protects against direct URL-to-stream scraping — a user who opens the .m3u8 URL gets encrypted segments they can't decode without the key.
  • Key delivery is the weak link. If the KeyUri is public, so is the content. The server MUST gate key delivery behind auth that's proportional to the content's sensitivity.
  • Segments exist unencrypted on disk during encoding. The encryption happens as ffmpeg writes. If your source-side storage is compromised the segments are exposed.
  • Works on every HLS-capable client. Safari, Chrome, iOS, Android, most smart TVs. Apple TV — the entire device ecosystem.

CENC DASH (planned, not shipped)

DASH can use the Common Encryption (CENC) scheme for multi-DRM support: one encrypted mezzanine, separate Widevine / PlayReady / FairPlay license servers for different client families. Spec-compliant but needs:

  • mp4box or shaka-packager for segment-level encryption
  • License server integration (usually a paid service)
  • Certificate handling per DRM system

This is marked in the PRD as paid-tier work. AES-128 HLS covers the home / prosumer / casual-paywall use case.

Subtitle + DRM interaction

The validator enforces:

  • MP4 Extract mode: WebVTT or SRT only
  • HLS Extract mode: WebVTT only (ASS warned as lossy conversion)
  • Mkv / DASH: all subtitle codecs allowed

When DRM is enabled on an HLS encode, subtitle sidecars are NOT encrypted — they're small and not considered content worth protecting. The video segments are where the encryption matters.

Live Transcode

Some clients can't play the source file directly — an Apple TV that can't decode AV1, a phone with a 720p screen streaming a 4K source, a browser that wants HLS when the source is raw MKV. Live transcode runs a real-time ffmpeg pipeline that produces a playable HLS stream for exactly that client, on demand.

When does it kick in

The web player detects client capabilities at session start. When the source doesn't match what the client can play, it asks the server for a live transcode:

POST /api/v1/streaming/live/sessions
{
  "video_file_id": "01JH3X...",
  "client_capabilities": {
    "video_codecs": ["h264", "hevc"],
    "audio_codecs": ["aac", "mp3"],
    "max_resolution": "1080p",
    "hdr_support": "none"
  }
}

The LiveStreamingService creates a LiveSession:

  1. Picks a quality profile matching the client's constraints (via LiveQualitySelector)
  2. Spawns an ILiveFfmpegRunner that starts ffmpeg writing HLS segments into a session-scoped temp dir
  3. Returns { session_id, playlist_url } to the client

The client hits the playlist URL, HLS player pulls segments as they become available. Session ends when the client disconnects.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                      LiveStreamingService                        │
│  Creates sessions, tracks active ones, cleans up on timeout      │
└───────────────────────────┬──────────────────────────────────────┘
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                         LiveSession                              │
│  Holds session state, CTS for cancellation, playback position    │
└───────────────────────────┬──────────────────────────────────────┘
                            ▼
                ┌───────────┴───────────┐
                ▼                       ▼
┌───────────────────────┐   ┌───────────────────────┐
│  ILiveFfmpegRunner    │   │  LivePlaylistBuilder  │
│  Spawns + monitors    │   │  Reads index.m3u8,    │
│  ffmpeg process       │   │  emits segment events │
└───────────────────────┘   └───────────────────────┘

What the ffmpeg command looks like

ffmpeg -ss 0 -i /media/movies/source.mkv \
       -c:v h264_nvenc -preset p4 -b:v 6000k \
       -c:a aac -b:a 192k -ac 2 \
       -f hls -hls_time 4 -hls_playlist_type event \
       -hls_segment_filename 'segment_%03d.ts' \
       -progress pipe:1 \
       /tmp/nomercy-live/{session_id}/index.m3u8

Notable details:

  • -hls_playlist_type event — event playlists grow as segments are written, no rewrites. Live-style playback semantics even though the source is a finite file.
  • -hls_time 4 — 4-second segments for low latency. Standard VOD HLS uses 6; live transcode trades segment count for startup time.
  • -progress pipe:1 — structured progress output the runner parses

Session lifecycle

┌──────────┐   POST /sessions   ┌──────────┐
│  Client  │───────────────────▶│  Server  │
└──────────┘                    └────┬─────┘
                                      │
                                      │ create session
                                      │ spawn ffmpeg
                                      ▼
                                ┌──────────┐
                                │  Active  │
                                └────┬─────┘
                                      │
                GET /playlist.m3u8    │
      ◀─────────────────────────────┤
                                      │
                GET /segment/0.ts     │
      ◀─────────────────────────────┤
                                      │
                POST /position        │
      ─────────────────────────────▶│ update playback position
                                      │ (for pause/resume)
                                      │
                DELETE /sessions/{id} │
      ─────────────────────────────▶│ cancel ffmpeg
                                      │ clean up temp dir
                                      ▼
                                ┌──────────┐
                                │  Ended   │
                                └──────────┘

Cancellation

Client disconnects (browser tab closes, player quits) → client calls DELETE → LiveSession.DisposeAsync() → runner CTS fires → ffmpeg process killed → temp dir deleted.

Ghost sessions (client crashed without calling DELETE) get cleaned up by the SessionManager after an idle timeout. Default is 5 minutes of no playlist / segment requests.

Buffer management

The BufferManager watches the client's playback position vs the segments already generated. It emits BufferAction events:

Action Trigger Effect
Suspend Buffer > 30s ahead Pause ffmpeg — wait for client to catch up
Resume Buffer < 15s ahead after suspend Unpause ffmpeg
DropQuality Buffer < 5s Ladder-aware: switch to lower bitrate variant
EmergencyDropQuality Buffer < 3s Switch to lowest variant

Keeps a slow CPU / GPU from running ahead when the client is pausing to read subtitles, and ramps back up when playback resumes.

Seek handling

HLS segments are sequential. When a client seeks, the current live session can't jump ahead — it's already encoded past that point OR not yet reached it.

Current behavior: seek closes the session and creates a new one from the seek timestamp. Session creation is fast (ffmpeg spawn is ~1s on a decent box) so the user experience is "brief pause, new stream continues."

Future work noted in the PRD: transparent seek within an existing session by repositioning the ffmpeg input — not shipped yet.

Protocol extensions

The Protocol subdirectory contains the request / response types for the session-management API. It's versioned via Asp.Versioning — v1.0 is the current stable shape.

The transport is simple HTTP but nothing prevents switching to SignalR for bidirectional session control (server pushes "buffer full, pausing" notifications). The current polling-only design is simpler to debug.

Concurrent sessions

Each session is an ffmpeg process. GPU encoders have fixed concurrent session limits (NVENC: 12 per card, QSV: ~8). The SessionManager tracks active sessions per GPU and refuses new ones when the limit is reached, with a clear error:

{
  "error": "GPU encoder capacity exhausted",
  "details": "Card 'RTX 4080' is running 12 concurrent NVENC sessions. 
              Wait for an existing playback to end or fall back to software."
}

CPU sessions don't have hard limits — the dispatcher just gets slower as the CPU saturates.

Where the live cache lives

EncoderOptions.LiveTranscodeCachePath — defaults to {temp}/nomercy-live. Each session gets a subdirectory {cache}/{session_id}/ holding the in-flight playlist + segments.

On session end the subdirectory is deleted. On server restart, any leftover directories get swept — they're by definition orphaned since no sessions persist across restarts.

Difference from file encoding

Live transcode and file encoding share the EncodingStrategy + ffmpeg execution layers but:

  • Live has no Finalize stage — no stitching, no master-playlist cleanup (event playlists are the final form)
  • Live has no checkpoints — resume isn't meaningful for a session tied to a live client
  • Live skips the full Analyze stage — the client already said what it wants, the planner takes a shortcut

The Live-specific code lives in NoMercy.Encoder.LiveTranscode. It runs in the same server process as file encoding — sessions compete for the same GPU / CPU budget. The SessionManager manages that contention.

Disc Ripping

Physical media imports — Bluray, DVD, HD DVD — via the same encoder stack that handles file encoding. The DiscRipping subsystem detects discs as they're inserted, lists the titles on the disc, and rips selected titles to intermediate .mkv files the regular encoding pipeline can pick up.

Drive monitoring

IDriveMonitor polls the OS's list of optical drives every 3 seconds and emits DriveEvents:

Event When
DriveAdded A new CDRom drive appeared (USB Bluray drive plugged in)
DriveRemoved A drive went away
DiscInserted A drive that was empty now has media
DiscEjected A drive that had media is now empty

Singleton scope — state persists across MonitorAsync() enumerations so the poller knows what changed since last check.

Cross-platform: uses DriveInfo.GetDrives() filtered to DriveType.CDRom — works on Windows, Linux (udev), and macOS without platform-specific code.

Disc scanning

When a disc is inserted, IDiscScanner reads the title structure via ffprobe's bluray / dvdread pseudo-URLs:

ffprobe -v quiet -print_format json -show_format \
        -show_streams bluray:/mnt/bluray

Returns a DiscInfo record:

{
  "mount_point": "/mnt/bluray",
  "type": "Bluray",
  "titles": [
    {
      "index": 0,
      "duration_seconds": 7842,
      "chapters": 32,
      "video_streams": [
        { "codec": "hevc", "width": 3840, "height": 2160,
          "hdr": true, "dolby_vision": true }
      ],
      "audio_streams": [
        { "codec": "truehd", "channels": 8, "language": "eng" },
        { "codec": "ac3", "channels": 6, "language": "fra" }
      ],
      "subtitle_streams": [
        { "codec": "hdmv_pgs_subtitle", "language": "eng", "forced": false }
      ]
    },
    { "index": 1, "duration_seconds": 120, "..." },
    { "index": 2, "duration_seconds": 180, "..." }
  ]
}

Discs typically have a main feature (longest title, usually 1 to 3 hours) plus short titles for menus, trailers, extras. The scanner lists everything; the user picks what to rip.

The ripper

IDiscRipper wraps ffmpeg with disc-specific arguments:

ffmpeg -i 'bluray:/mnt/bluray?playlist=0' \
       -map 0:v:0 -map 0:a:0 -map 0:s:0 \
       -c copy \
       -copyts \
       /rip-output/title_00.mkv

Key details:

  • -playlist N selects the Bluray playlist (= title index)
  • -c copy stream-copies everything — no re-encoding at rip time. The intermediate MKV contains the exact source bitstream.
  • -map selections let the user opt in / out of specific audio + subtitle streams. Default: all streams, user can narrow via the dashboard UI before starting the rip.
  • -copyts preserves timestamps — matters for chapters + subs staying in sync

Output filename: {outputDir}/title_NN.mkv where NN is the zero-padded title index. Simple scheme — the regular file encoder picks these up automatically if the output directory is a watched folder with an assigned EncoderProfileFolder.

Rip-then-encode pipeline

The common flow:

  1. Insert disc. DriveMonitor fires DiscInserted.
  2. Scan. UI shows disc titles, user picks the main feature + optional extras, chooses which audio / subtitle tracks to keep.
  3. Rip. DiscRipper stream-copies to intermediate MKV(s). Duration: roughly the playback length of the disc (bluray read speed is the bottleneck, not CPU).
  4. Auto-encode. The AutoEncodeSubscriber watches the rip output directory. When a new MKV lands, it dispatches an encoding job with the profile assigned to that folder.
  5. Content analysis. Post-encode subscribers run crop detection, intro / outro fingerprinting, OCR on bitmap subs, etc.

Total time on a typical feature film:

  • Rip: 30–60 min (bluray read speed)
  • Encode with hardware: 20–40 min per variant
  • Analysis: a few minutes

Metadata

The ripper doesn't yet resolve metadata (movie title, director, year, cover art, etc.) — that's a separate IDiscMetadataResolver interface currently scaffolded but not implemented. The ripped MKVs land in a folder keyed by disc type + scan timestamp; the user currently moves + renames post-rip based on the content.

Future work in the PRD: auto-query TMDB with the main title's duration + any disc-embedded metadata, suggest a folder structure, let the user confirm before moving.

Supported media

  • Bluray — via libbluray. Region-free discs work out of the box. Region-locked Blurays need an appropriate drive firmware (not a software concern).
  • DVD — via libdvdread. Both CSS-encrypted and unencrypted.
  • HD DVD — technically supported via generic ffmpeg input but not explicitly tested; the format is dead enough that we don't guarantee it
  • AVCHD camcorder discs — work via the Bluray scanner
  • Data discs with video files — not handled by the ripper; treated as a regular filesystem mount

Known limitations

  • AACS / BD+ protected discs require a compatible drive + key management outside the scope of this encoder. The ripper reads the decrypted stream once the drive has decrypted it; it doesn't do key retrieval itself.
  • No transcoding at rip time. Deliberately — stream-copy is lossless and reversible. The re-encode happens later, against the ripped MKV, with whatever profile the user picked.
  • Single-rip at a time per drive. Optical drives can't read two titles in parallel; the scanner is single-threaded per drive. Multiple drives on the same host can rip concurrently.
  • No drive-specific tuning. Read speed is whatever the drive defaults to. Fast-reading LG / Asus drives rip faster than conservative-reading drives — not much the encoder can do about it.

Security

  • The ripper runs with DriveInfo + mounted filesystem access — no elevated privileges required on Linux if the user is in the cdrom group
  • Output paths check against the PathAllowlist — the ripper can't write outside configured output directories
  • Disc content isn't trusted input the way random network media is, but the scanner still runs in a restricted ffprobe invocation with no filter-chain eval

Distributed Encoding

One coordinator server dispatches encode tasks to one or more remote worker servers. Transparent when no workers are registered (everything runs local). Scales out when workers are added.

Who this is for

  • Prosumers with a media server + a workstation that has a better GPU. Route encode tasks to the workstation, keep the media server lean.
  • Small studios with multiple machines and large libraries. Chop encode queues across the fleet.
  • Anyone with idle hardware they want to contribute to encoding their own content.

Quick start

Both machines need the same DistributedEncodingSigningKey in EncoderOptions. Generate one:

openssl rand -base64 32

Coordinator appsettings.json:

{
  "Encoder": {
    "DistributedEncodingSigningKey": "<shared-key>"
  }
}

Worker appsettings.json:

{
  "Encoder": {
    "DistributedEncodingSigningKey": "<same-key>",
    "CoordinatorUrl": "https://coordinator.example.com:7626",
    "WorkerSelfBaseUrl": "https://this-worker.example.com:7626",
    "WorkerId": "beast-unit"
  }
}

Start both. Worker auto-registers. Coordinator's /api/v1/dashboard/workers now lists it. Next encode uses both.

Architecture

                   ┌───────────────────────────────────────────┐
                   │               Coordinator                 │
                   │  (your regular NoMercy media server)      │
                   │                                           │
                   │  ┌─────────────────────────────────────┐  │
                   │  │     RemoteWorkerDispatcher          │  │
                   │  │  Picks worker per task, handles     │  │
                   │  │  retry chain + local fallback       │  │
                   │  └──────────────┬──────────────────────┘  │
                   │                 │                         │
                   │  ┌──────────────▼──────────────────────┐  │
                   │  │  InMemoryRemoteWorkerRegistry       │  │
                   │  │  Active workers, health tracking,   │  │
                   │  │  cooldown eviction                  │  │
                   │  └─────────────────────────────────────┘  │
                   └──────────────────┬────────────────────────┘
                                      │ HTTP + HMAC
                                      │
               ┌──────────────────────┼──────────────────────┐
               │                      │                      │
       ┌───────▼────────┐     ┌───────▼────────┐     ┌───────▼────────┐
       │   Worker A     │     │   Worker B     │     │   Worker C     │
       │  (workstation) │     │  (laptop)      │     │  (NAS)         │
       │                │     │                │     │                │
       │  LocalWorker   │     │  LocalWorker   │     │  LocalWorker   │
       │  Dispatcher    │     │  Dispatcher    │     │  Dispatcher    │
       └────────────────┘     └────────────────┘     └────────────────┘

How tasks flow

  1. User starts an encode on the coordinator
  2. Strategy decomposes the job into EncodeTask[] — one per ABR variant, or one per time range for a two-pass chunked encode
  3. RemoteWorkerDispatcher.DispatchAsync(tasks, ct) fires: a. Reads registry.GetActiveWorkers() — hides cooled-down workers b. WorkerAssigner load-balances tasks across workers based on speed × slots c. Dispatches each task to its assigned worker in parallel
  4. Worker receives signed task, runs it, returns signed result
  5. Coordinator assembles results, runs the Finalize stage locally (stitching playlists, writing master manifests)

Security model

HMAC-signed payloads

Every coordinator → worker and worker → coordinator call that carries task data is HMAC-SHA256 signed with the shared key. An attacker who intercepts the traffic can read payloads but can't modify them — the signature won't verify.

5-minute replay window

Every signed payload carries a UTC timestamp. Requests older than 5 minutes are rejected. Stops someone capturing a signed task and replaying it days later.

Library-allowlisted source fetches

When a worker pulls source files from the coordinator (see "File transfer" below), the coordinator checks the requested path against the VideoFiles table. Only paths that correspond to known library files get served. A leaked signing key doesn't turn the coordinator into a general file-read oracle.

HTTPS required for non-loopback

The /workers/register endpoint rejects non-HTTPS worker URLs unless the target is a loopback address. Local dev can use HTTP on 127.0.0.1; WAN deployments must use TLS.

Progress payloads are unauthenticated

The progress push endpoint accepts anonymous POSTs. Rationale: progress bodies contain no secrets, and spoofing just moves a fake progress bar. Real task dispatch + source fetch still require HMAC.

Self-registration

WorkerSelfRegistrationService is a hosted background service that runs on workers:

On boot:
  POST /api/v1/dashboard/workers/register
  body: {
    worker_id, base_url, cpu_cores,
    available_cpu_threads, available_gpu_slots, gpus
  }

Every WorkerHeartbeatInterval (default 20s):
  POST /api/v1/dashboard/workers/{id}/heartbeat

On clean shutdown:
  DELETE /api/v1/dashboard/workers/{id}

Failure handling:

  • Initial registration fails → service logs warning, keeps retrying on the heartbeat loop (no crash)
  • Heartbeat returns 404 → coordinator doesn't know us, assume coordinator restart or late eviction after outage, re-register
  • Coordinator unreachable → heartbeats fail silently, coordinator's stale eviction removes us after 60s; when connection restores, auto re-register

No config = service exits cleanly. Standalone installs have this service registered but it no-ops.

Health tracking

Every task's outcome is reported back to the registry via RecordTaskOutcome:

  • 3 consecutive failures → worker enters a 2-minute cooldown
  • Any success clears the counter
  • Re-registration clears the cooldown explicitly
  • Cooldowns auto-expire

During cooldown:

  • GetActiveWorkers() hides the worker (dispatcher skips it)
  • GetAllWorkersWithHealth() still returns it with cooldown status so the dashboard can show "worker-x: cooldown, 3 failures, back at 12:05"

Dashboard view:

GET /api/v1/dashboard/workers

{
  "distribution_enabled": true,
  "count": 3,
  "active_count": 2,
  "data": [
    {
      "worker_id": "beast-unit",
      "available_gpu_slots": 12,
      "cpu_cores": 24,
      "status": "active",
      "consecutive_failures": 0
    },
    {
      "worker_id": "flaky-laptop",
      "status": "cooldown",
      "consecutive_failures": 3,
      "cooldown_until_utc": "2026-04-17T12:05:00Z"
    },
    { "worker_id": "nas", "status": "active", ... }
  ]
}

Retry chain

The dispatcher tries up to 2 remote workers per task before falling back to local:

Task T → worker A (initial pick)
  ├─ Success: return result, done
  ├─ Worker reports failure OR throws
  │    ▼
  │  Task T → worker B (next-best by slots)
  │    ├─ Success: return result
  │    └─ Still failing
  │         ▼
  │       Task T → local dispatcher (always succeeds if source is valid)

The retry is ONLY for this task. Other tasks continue on their original workers in parallel. A single bad GPU doesn't stall the whole job.

File transfer

When coordinator + workers share storage (shared NAS, SMB mount), workers see the task's input path on their own filesystem. Zero network transfer for source files.

When they don't — WAN deployment, cloud worker, etc. — the HttpSourceFetcher kicks in:

  1. Worker receives signed task, checks File.Exists(InputPath)
  2. If missing, builds a signed download URL:
    GET /api/v1/worker-source?path=<path>&ts=<now>&sig=<hmac(path|ts, key)>
    
  3. Coordinator verifies signature + timestamp + library allowlist, streams the file via PhysicalFile with range-processing enabled
  4. Worker writes to {cache}/remote-sources/{task-id}.{ext} — streamed straight to disk, no memory load
  5. Worker rewrites task command args to swap the original path for the cached path
  6. Encode runs
  7. Finally block: ISourceFetcher.ReleaseAsync deletes the cached file

Retries reuse the cached file — downloading a 4K source once per attempt would be wasteful.

Shared-storage installs swap HttpSourceFetcher for NullSourceFetcher in DI — it just returns the original path unchanged. No code changes needed; config-driven.

Live progress

While a task runs, the HttpTaskProgressSink on the worker POSTs progress snapshots to the coordinator every 2 seconds:

POST /api/v1/dashboard/workers/{id}/tasks/{task-id}/progress
{
  "task_id": "...",
  "percent_complete": 42.5,
  "current_fps": 180,
  "current_speed": 6.0,
  "current_stage": "encode",
  "elapsed_seconds": 128.3,
  "estimated_remaining_seconds": 174.2,
  "current_time_seconds": 850.0,
  "duration_seconds": 2000.0,
  "bitrate_kbps": 5200
}

Coordinator's InMemoryTaskProgressStore caches the latest snapshot per task with 15-minute stale eviction. Dashboard reads:

GET /api/v1/dashboard/workers/tasks/progress

{ "count": 3, "data": [ { "task_id": ..., "percent_complete": ... } ] }

Fire-and-forget on the worker side — ffmpeg's progress thread never blocks on a slow coordinator. Failed pushes are logged and swallowed (progress is best-effort; the encode's success doesn't depend on it).

Throttled to one POST per 2s per task — ffmpeg emits every ~500ms, the UI doesn't need more granularity.

Scaling hints

  • WorkerAssigner load-balances by SpeedMultiplier × AvailableSlots. Workers with more CPU + GPU capacity get more work. Heavy QualityVariant tasks schedule onto fast workers first; lighter TimeChunk tasks fill the remainder.
  • Speed index per (encoder × GPU × resolution) drives the SpeedMultiplier. A worker that has the higher-throughput AV1 encoder wins AV1 tasks even if it has fewer CPU cores.
  • Cooldown window is 2 minutes. Tunable via registry constructor args. Too short = thrashing in + out; too long = failed workers stay benched after they recover.

What's not in this milestone

  • No mTLS between coordinator and worker. HMAC-signed payloads are the full security story. Works on trusted LAN + HTTPS to the coordinator. WAN deployments should add a VPN or TLS client-cert layer externally.
  • No exponential backoff on retries. First worker fails, second worker tries immediately. If all remotes are flaky, the retry chain exhausts in seconds. Tune MaxRemoteAttempts (constant in RemoteWorkerDispatcher) or swap in a plugin dispatcher.
  • Source fetch isn't resumable across worker restarts. A worker crash mid-download discards the partial file; next attempt re-downloads from scratch. HTTP Range requests ARE enabled on the server — a fancier client could resume, but the current one streams straight to disk without checkpoint state.
  • Strategies don't auto-distribute yet. The dispatcher infrastructure is wired end-to-end. Existing single-machine strategies (HlsSinglePassStrategy et al.) still run whole jobs locally — they don't yet decompose into EncodeTask[] + dispatch. That's the final integration step needed to make distribution active for the built-in strategies. Plugin strategies can wire themselves up today.

Test coverage

28 tests across the distribution layer:

  • RemoteWorkerDispatcher: 8 — no-workers, worker-success, worker-fail-retry, worker-throws-retry, multi-worker load, cancellation, first-fails-second-succeeds-no-local
  • TaskSerializer: 8 — round-trip task + result, tamper, wrong-key, malformed, empty, missing fields
  • HttpRemoteWorker: 6 — 200-signed, 500, connection-refused, unsigned-reject, cancellation, UpdateSnapshot
  • InMemoryRegistry: 13 — register / re-register / heartbeat / unregister / stale eviction / snapshot stability + health-tracking
  • WorkerSelfRegistrationService: 4 — disabled no-op, register-on-start, unregister-on-stop, heartbeat-404-reregister
  • DistributionEndToEnd: 2 — full round-trip, signing-key mismatch fallback
  • HttpSourceFetcher: 6 — fast path, remote download, retry reuses cache, non-success throws, release deletes cache, null fetcher
  • TaskProgress: 7 — store round-trip, latest-wins, stale filtering, sink no-ops without coordinator, sink throttles, dispatcher forwards progress, dispatcher swallows sink exceptions

Where to go next

The setup guide in the repo has operator-facing instructions for standing up a cluster: config snippets, verification commands, troubleshooting, scaling notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment