Login

scubbo

I have a Jellyfin installation on a Kubernetes cluster. Playback works just fine via a web client, but playback to Kodi clients on other devices regularly hangs for ~20-30 seconds at a time every 10 seconds or so of playback. There are no logs on the Jellyfin pod at the time of the hanging indicating the reason; here are logs from a Kodi client covering the time period of several hangs, during which onscreen debug logging showed no notable change in memory or CPU usage, and FPS kept reporting 55-60. Absent any better ideas, I have attempted to enable Hardware Acceleration.

(I'm aware of the XY Problem - any advice for alternative methods of addressing the playback hangs other than Hardware Acceleration would also be gratefully received!)

I've been following the guides here and here attempting to enable hardware acceleration (all terminal commands on my node rather than on the Jellyfin pod unless specified otherwise):

* I've installed a Quadro P1000 video card one on of the nodes of my cluster, and confirmed it shows up:

Code:
$ lspci -nn | grep -Ei "3d|display|vga"

05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P1000] [10de:1cb1] (rev a1)

0a:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. G200eR2 [102b:0534] (rev 01)

* I've installed the NVIDIA driver (and rebooted) and NVIDIA Container Toolkit on the node:

Code:
$ nvidia-smi

Sat Feb  8 15:09:11 2025      

+---------------------------------------------------------------------------------------+

| NVIDIA-SMI 535.216.01            Driver Version: 535.216.01  CUDA Version: 12.2    |

|-----------------------------------------+----------------------+----------------------+

| GPU  Name                Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf          Pwr:Usage/Cap |        Memory-Usage | GPU-Util  Compute M. |

|                                        |                      |              MIG M. |

|=========================================+======================+======================|

|  0  Quadro P1000                  On  | 00000000:05:00.0 Off |                  N/A |

| 34%  27C    P8              N/A /  N/A |      4MiB /  4096MiB |      0%      Default |

|                                        |                      |                  N/A |

+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+

| Processes:                                                                            |

|  GPU  GI  CI        PID  Type  Process name                            GPU Memory |

|        ID  ID                                                            Usage      |

|=======================================================================================|

|  No running processes found                                                          |

+---------------------------------------------------------------------------------------+

* I followed these instructions to install the nvidia runtime, created a RuntimeClass for it, and enabled GPU support:

Code:
$ cat /etc/containerd/config.toml

version = 2

[plugins]

  [plugins."io.containerd.grpc.v1.cri"]

    [plugins."io.containerd.grpc.v1.cri".cni]

      bin_dir = "/usr/lib/cni"

      conf_dir = "/etc/cni/net.d"

    [plugins."io.containerd.grpc.v1.cri".containerd]

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]

          privileged_without_host_devices = false

          runtime_engine = ""

          runtime_root = ""

          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]

            BinaryName = "/usr/bin/nvidia-container-runtime"

  [plugins."io.containerd.internal.v1.opt"]

    path = "/var/lib/containerd/opt"

$ kubectl describe runtimeClass nvidia

Name:        nvidia

Namespace:    

Labels:      <none>

Annotations:  <none>

API Version:  node.k8s.io/v1

Handler:      nvidia

Kind:        RuntimeClass

Metadata:

  Creation Timestamp:  2025-02-08T21:57:21Z

  Resource Version:    283479326

  UID:                48cd5220-fc08-4c2a-87bc-8dfb2a30cf69

Events:                <none>

$ kubectl -n kube-system get daemonset nvidia-device-plugin-daemonset

NAME                            DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE

nvidia-device-plugin-daemonset  3        3        1      3            1          <none>          68m

# The pod deployed to the node with the GPU

$ kubectl -n kube-system logs nvidia-device-plugin-daemonset-lxxw8 

I0208 22:08:40.694494      1 main.go:235] "Starting NVIDIA Device Plugin" version=<

d475b2cf

commit: d475b2cfcf12b983a4975d4fc59d91af432cf28e

>

I0208 22:08:40.694591      1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins

I0208 22:08:40.694638      1 main.go:245] Starting OS watcher.

I0208 22:08:40.694850      1 main.go:260] Starting Plugins.

I0208 22:08:40.694874      1 main.go:317] Loading configuration.

I0208 22:08:40.695846      1 main.go:342] Updating config with default resource matching patterns.

I0208 22:08:40.696075      1 main.go:353] 

Running with config:

{

  "version": "v1",

  "flags": {

    "migStrategy": "none",

    "failOnInitError": false,

    "mpsRoot": "",

    "nvidiaDriverRoot": "/",

    "nvidiaDevRoot": "/",

    "gdsEnabled": false,

    "mofedEnabled": false,

    "useNodeFeatureAPI": null,

    "deviceDiscoveryStrategy": "auto",

    "plugin": {

      "passDeviceSpecs": false,

      "deviceListStrategy": [

        "envvar"

      ],

      "deviceIDStrategy": "uuid",

      "cdiAnnotationPrefix": "cdi.k8s.io/",

      "nvidiaCTKPath": "/usr/bin/nvidia-ctk",

      "containerDriverRoot": "/driver-root"

    }

  },

  "resources": {

    "gpus": [

      {

        "pattern": "*",

        "name": "nvidia.com/gpu"

      }

    ]

  },

  "sharing": {

    "timeSlicing": {}

  },

  "imex": {}

}

I0208 22:08:40.696086      1 main.go:356] Retrieving plugins.

I0208 22:08:40.719070      1 server.go:195] Starting GRPC server for 'nvidia.com/gpu'

I0208 22:08:40.721000      1 server.go:139] Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock

I0208 22:08:40.723731      1 server.go:146] Registered device plugin for 'nvidia.com/gpu' with Kubelet

* Following these instructions, I confirmed that NVIDIA GPUs can be requested by a container:

Code:
# Intentionally _not_ using the runtimeClassName to show that this fails:

$ cat <<EOF | kubectl apply -f -

apiVersion: v1

kind: Pod

metadata:

  name: gpu-pod

spec:

  restartPolicy: Never

  # runtimeClassName: nvidia

  containers:

    - name: cuda-container

      image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0

      resources:

        limits:

          nvidia.com/gpu: 1

  tolerations:

    - key: nvidia.com/gpu

      operator: Exists

      effect: NoSchedule

EOF

$ kubectl logs gpu-pod

Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!

[Vector addition of 50000 elements]

# Now run the same `apply` but with the addition of `runtimeClassName: nvidia`

$ kubectl logs gpu-pod

[Vector addition of 50000 elements]

Copy input data from the host memory to the CUDA device

CUDA kernel launch with 196 blocks of 256 threads

Copy output data from the CUDA device to the host memory

Test PASSED

Done

* I've set runtimeClassName=nvidia for the Jellyfin pod, and explicitly set the environment variables from here:

Code:
$ kubectl -n jellyfin describe pod jellyfin-58b9dddd5d-5vh9c

...

Runtime Class Name:  nvidia

...

Containers:

  jellyfin:

    ...

    Environment:

      NVIDIA_DRIVER_CAPABILITIES:  all

      NVIDIA_VISIBLE_DEVICES:      all

...

* nvidia-smi on the Jellyfin pod shows the video card:

Code:
$ kubectl exec -it -n jellyfin jellyfin-58b9dddd5d-5vh9c -- nvidia-smi

Sat Feb  8 23:33:49 2025      

+---------------------------------------------------------------------------------------+

| NVIDIA-SMI 535.216.01            Driver Version: 535.216.01  CUDA Version: 12.2    |

|-----------------------------------------+----------------------+----------------------+

| GPU  Name                Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf          Pwr:Usage/Cap |        Memory-Usage | GPU-Util  Compute M. |

|                                        |                      |              MIG M. |

|=========================================+======================+======================|

|  0  Quadro P1000                  On  | 00000000:05:00.0 Off |                  N/A |

| 34%  28C    P8              N/A /  N/A |      4MiB /  4096MiB |      0%      Default |

|                                        |                      |                  N/A |

+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+

| Processes:                                                                            |

|  GPU  GI  CI        PID  Type  Process name                            GPU Memory |

|        ID  ID                                                            Usage      |

|=======================================================================================|

|  No running processes found                                                          |

+---------------------------------------------------------------------------------------+

* I've enabled Nvidia NVENC Transcoding in Jellyfin Playback Settings - enabling H264, MPEG2, VC1, VP9, and VP9 10 bit codecs (s per compatibility specs for the Quaddro P1000 here), and enabling "Enable enhanced NVDEC decoder" and "Enable hardware encoding".

And yet, when I "Play a video in the Jellyfin web client and trigger a video transcoding by setting a lower resolution or bitrate.", or encounter hanging playback on a Kodi client, nvidia-smi on the Jellyfin Pod shows no .../ffmpeg processes, and the hanging playback remains.

Any ideas on things I might have missed with Hardware Acceleration - or, for addressing hanging playback in general?

TheDreadPirate · 2025-02-10, 01:17 PM

Kodi will almost always direct play the video, meaning no ffmpeg process will spawn. ffmpeg only starts when transcoding is needed.

Can you describe your networking setup? And whether you're using a reverse proxy? What you're describing sounds more like network related buffering.

Can you share your full jellyfin log via privatebin.net?

Login

Username/Email:
Password:


Or login with a social network below

Login

Username/Email:
Password:


Or login with a social network below

Enable NVIDIA Hardware Transcode on k8s