I have a Jellyfin installation on a Kubernetes cluster. Playback works just fine via a web client, but playback to Kodi clients on other devices regularly hangs for ~20-30 seconds at a time every 10 seconds or so of playback. There are no logs on the Jellyfin pod at the time of the hanging indicating the reason; here are logs from a Kodi client covering the time period of several hangs, during which onscreen debug logging showed no notable change in memory or CPU usage, and FPS kept reporting 55-60. Absent any better ideas, I have attempted to enable Hardware Acceleration.
(I'm aware of the XY Problem - any advice for alternative methods of addressing the playback hangs other than Hardware Acceleration would also be gratefully received!)
I've been following the guides here and here attempting to enable hardware acceleration (all terminal commands on my node rather than on the Jellyfin pod unless specified otherwise):
* I've installed a Quadro P1000 video card one on of the nodes of my cluster, and confirmed it shows up:
* I've installed the NVIDIA driver (and rebooted) and NVIDIA Container Toolkit on the node:
* I followed these instructions to install the nvidia runtime, created a RuntimeClass for it, and enabled GPU support:
* Following these instructions, I confirmed that NVIDIA GPUs can be requested by a container:
* I've set
*
* I've enabled Nvidia NVENC Transcoding in Jellyfin Playback Settings - enabling H264, MPEG2, VC1, VP9, and VP9 10 bit codecs (s per compatibility specs for the Quaddro P1000 here), and enabling "Enable enhanced NVDEC decoder" and "Enable hardware encoding".
And yet, when I "Play a video in the Jellyfin web client and trigger a video transcoding by setting a lower resolution or bitrate.", or encounter hanging playback on a Kodi client,
Any ideas on things I might have missed with Hardware Acceleration - or, for addressing hanging playback in general?
(I'm aware of the XY Problem - any advice for alternative methods of addressing the playback hangs other than Hardware Acceleration would also be gratefully received!)
I've been following the guides here and here attempting to enable hardware acceleration (all terminal commands on my node rather than on the Jellyfin pod unless specified otherwise):
* I've installed a Quadro P1000 video card one on of the nodes of my cluster, and confirmed it shows up:
Code:
$ lspci -nn | grep -Ei "3d|display|vga"
05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P1000] [10de:1cb1] (rev a1)
0a:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. G200eR2 [102b:0534] (rev 01)
* I've installed the NVIDIA driver (and rebooted) and NVIDIA Container Toolkit on the node:
Code:
$ nvidia-smi
Sat Feb 8 15:09:11 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01 Driver Version: 535.216.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Quadro P1000 On | 00000000:05:00.0 Off | N/A |
| 34% 27C P8 N/A / N/A | 4MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
* I followed these instructions to install the nvidia runtime, created a RuntimeClass for it, and enabled GPU support:
Code:
$ cat /etc/containerd/config.toml
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/usr/lib/cni"
conf_dir = "/etc/cni/net.d"
[plugins."io.containerd.grpc.v1.cri".containerd]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
[plugins."io.containerd.internal.v1.opt"]
path = "/var/lib/containerd/opt"
$ kubectl describe runtimeClass nvidia
Name: nvidia
Namespace:
Labels: <none>
Annotations: <none>
API Version: node.k8s.io/v1
Handler: nvidia
Kind: RuntimeClass
Metadata:
Creation Timestamp: 2025-02-08T21:57:21Z
Resource Version: 283479326
UID: 48cd5220-fc08-4c2a-87bc-8dfb2a30cf69
Events: <none>
$ kubectl -n kube-system get daemonset nvidia-device-plugin-daemonset
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
nvidia-device-plugin-daemonset 3 3 1 3 1 <none> 68m
# The pod deployed to the node with the GPU
$ kubectl -n kube-system logs nvidia-device-plugin-daemonset-lxxw8
I0208 22:08:40.694494 1 main.go:235] "Starting NVIDIA Device Plugin" version=<
d475b2cf
commit: d475b2cfcf12b983a4975d4fc59d91af432cf28e
>
I0208 22:08:40.694591 1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins
I0208 22:08:40.694638 1 main.go:245] Starting OS watcher.
I0208 22:08:40.694850 1 main.go:260] Starting Plugins.
I0208 22:08:40.694874 1 main.go:317] Loading configuration.
I0208 22:08:40.695846 1 main.go:342] Updating config with default resource matching patterns.
I0208 22:08:40.696075 1 main.go:353]
Running with config:
{
"version": "v1",
"flags": {
"migStrategy": "none",
"failOnInitError": false,
"mpsRoot": "",
"nvidiaDriverRoot": "/",
"nvidiaDevRoot": "/",
"gdsEnabled": false,
"mofedEnabled": false,
"useNodeFeatureAPI": null,
"deviceDiscoveryStrategy": "auto",
"plugin": {
"passDeviceSpecs": false,
"deviceListStrategy": [
"envvar"
],
"deviceIDStrategy": "uuid",
"cdiAnnotationPrefix": "cdi.k8s.io/",
"nvidiaCTKPath": "/usr/bin/nvidia-ctk",
"containerDriverRoot": "/driver-root"
}
},
"resources": {
"gpus": [
{
"pattern": "*",
"name": "nvidia.com/gpu"
}
]
},
"sharing": {
"timeSlicing": {}
},
"imex": {}
}
I0208 22:08:40.696086 1 main.go:356] Retrieving plugins.
I0208 22:08:40.719070 1 server.go:195] Starting GRPC server for 'nvidia.com/gpu'
I0208 22:08:40.721000 1 server.go:139] Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
I0208 22:08:40.723731 1 server.go:146] Registered device plugin for 'nvidia.com/gpu' with Kubelet
* Following these instructions, I confirmed that NVIDIA GPUs can be requested by a container:
Code:
# Intentionally _not_ using the runtimeClassName to show that this fails:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
# runtimeClassName: nvidia
containers:
- name: cuda-container
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
EOF
$ kubectl logs gpu-pod
Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
[Vector addition of 50000 elements]
# Now run the same `apply` but with the addition of `runtimeClassName: nvidia`
$ kubectl logs gpu-pod
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
* I've set
runtimeClassName=nvidia
for the Jellyfin pod, and explicitly set the environment variables from here:Code:
$ kubectl -n jellyfin describe pod jellyfin-58b9dddd5d-5vh9c
...
Runtime Class Name: nvidia
...
Containers:
jellyfin:
...
Environment:
NVIDIA_DRIVER_CAPABILITIES: all
NVIDIA_VISIBLE_DEVICES: all
...
*
nvidia-smi
on the Jellyfin pod shows the video card:Code:
$ kubectl exec -it -n jellyfin jellyfin-58b9dddd5d-5vh9c -- nvidia-smi
Sat Feb 8 23:33:49 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01 Driver Version: 535.216.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Quadro P1000 On | 00000000:05:00.0 Off | N/A |
| 34% 28C P8 N/A / N/A | 4MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
* I've enabled Nvidia NVENC Transcoding in Jellyfin Playback Settings - enabling H264, MPEG2, VC1, VP9, and VP9 10 bit codecs (s per compatibility specs for the Quaddro P1000 here), and enabling "Enable enhanced NVDEC decoder" and "Enable hardware encoding".
And yet, when I "Play a video in the Jellyfin web client and trigger a video transcoding by setting a lower resolution or bitrate.", or encounter hanging playback on a Kodi client,
nvidia-smi
on the Jellyfin Pod shows no .../ffmpeg
processes, and the hanging playback remains.Any ideas on things I might have missed with Hardware Acceleration - or, for addressing hanging playback in general?