• Login
  • Register
  • Login Register
    Login
    Username/Email:
    Password:
    Or login with a social network below
  • Forum
  • Website
  • GitHub
  • Status
  • Translation
  • Features
  • Team
  • Rules
  • Help
  • Feeds
User Links
  • Login
  • Register
  • Login Register
    Login
    Username/Email:
    Password:
    Or login with a social network below

    Useful Links Forum Website GitHub Status Translation Features Team Rules Help Feeds
    Jellyfin Forum Support Troubleshooting Enable NVIDIA Hardware Transcode on k8s

     
    • 0 Vote(s) - 0 Average

    Enable NVIDIA Hardware Transcode on k8s

    scubbo
    Offline

    Junior Member

    Posts: 4
    Threads: 2
    Joined: 2023 Jun
    Reputation: 0
    #1
    2025-02-08, 11:46 PM (This post was last modified: 2025-02-08, 11:56 PM by scubbo. Edited 1 time in total.)
    I have a Jellyfin installation on a Kubernetes cluster. Playback works just fine via a web client, but playback to Kodi clients on other devices regularly hangs for ~20-30 seconds at a time every 10 seconds or so of playback. There are no logs on the Jellyfin pod at the time of the hanging indicating the reason; here are logs from a Kodi client covering the time period of several hangs, during which onscreen debug logging showed no notable change in memory or CPU usage, and FPS kept reporting 55-60. Absent any better ideas, I have attempted to enable Hardware Acceleration.

    (I'm aware of the XY Problem - any advice for alternative methods of addressing the playback hangs other than Hardware Acceleration would also be gratefully received!)

    I've been following the guides here and here attempting to enable hardware acceleration (all terminal commands on my node rather than on the Jellyfin pod unless specified otherwise):

    * I've installed a Quadro P1000 video card one on of the nodes of my cluster, and confirmed it shows up:

    Code:
    $ lspci -nn | grep -Ei "3d|display|vga"
    05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P1000] [10de:1cb1] (rev a1)
    0a:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. G200eR2 [102b:0534] (rev 01)



    * I've installed the NVIDIA driver (and rebooted) and NVIDIA Container Toolkit on the node:


    Code:
    $ nvidia-smi
    Sat Feb  8 15:09:11 2025     
    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI 535.216.01            Driver Version: 535.216.01  CUDA Version: 12.2    |
    |-----------------------------------------+----------------------+----------------------+
    | GPU  Name                Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf          Pwr:Usage/Cap |        Memory-Usage | GPU-Util  Compute M. |
    |                                        |                      |              MIG M. |
    |=========================================+======================+======================|
    |  0  Quadro P1000                  On  | 00000000:05:00.0 Off |                  N/A |
    | 34%  27C    P8              N/A /  N/A |      4MiB /  4096MiB |      0%      Default |
    |                                        |                      |                  N/A |
    +-----------------------------------------+----------------------+----------------------+
                                                                                           
    +---------------------------------------------------------------------------------------+
    | Processes:                                                                            |
    |  GPU  GI  CI        PID  Type  Process name                            GPU Memory |
    |        ID  ID                                                            Usage      |
    |=======================================================================================|
    |  No running processes found                                                          |
    +---------------------------------------------------------------------------------------+


    * I followed these instructions to install the nvidia runtime, created a RuntimeClass for it, and enabled GPU support:

    Code:
    $ cat /etc/containerd/config.toml
    version = 2

    [plugins]

      [plugins."io.containerd.grpc.v1.cri"]

        [plugins."io.containerd.grpc.v1.cri".cni]
          bin_dir = "/usr/lib/cni"
          conf_dir = "/etc/cni/net.d"

        [plugins."io.containerd.grpc.v1.cri".containerd]

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

            [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
              privileged_without_host_devices = false
              runtime_engine = ""
              runtime_root = ""
              runtime_type = "io.containerd.runc.v2"

              [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
                BinaryName = "/usr/bin/nvidia-container-runtime"

      [plugins."io.containerd.internal.v1.opt"]
        path = "/var/lib/containerd/opt"

    $ kubectl describe runtimeClass nvidia
    Name:        nvidia
    Namespace:   
    Labels:      <none>
    Annotations:  <none>
    API Version:  node.k8s.io/v1
    Handler:      nvidia
    Kind:        RuntimeClass
    Metadata:
      Creation Timestamp:  2025-02-08T21:57:21Z
      Resource Version:    283479326
      UID:                48cd5220-fc08-4c2a-87bc-8dfb2a30cf69
    Events:                <none>

    $ kubectl -n kube-system get daemonset nvidia-device-plugin-daemonset
    NAME                            DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
    nvidia-device-plugin-daemonset  3        3        1      3            1          <none>          68m

    # The pod deployed to the node with the GPU
    $ kubectl -n kube-system logs nvidia-device-plugin-daemonset-lxxw8
    I0208 22:08:40.694494      1 main.go:235] "Starting NVIDIA Device Plugin" version=<
    d475b2cf
    commit: d475b2cfcf12b983a4975d4fc59d91af432cf28e
    >
    I0208 22:08:40.694591      1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins
    I0208 22:08:40.694638      1 main.go:245] Starting OS watcher.
    I0208 22:08:40.694850      1 main.go:260] Starting Plugins.
    I0208 22:08:40.694874      1 main.go:317] Loading configuration.
    I0208 22:08:40.695846      1 main.go:342] Updating config with default resource matching patterns.
    I0208 22:08:40.696075      1 main.go:353]
    Running with config:
    {
      "version": "v1",
      "flags": {
        "migStrategy": "none",
        "failOnInitError": false,
        "mpsRoot": "",
        "nvidiaDriverRoot": "/",
        "nvidiaDevRoot": "/",
        "gdsEnabled": false,
        "mofedEnabled": false,
        "useNodeFeatureAPI": null,
        "deviceDiscoveryStrategy": "auto",
        "plugin": {
          "passDeviceSpecs": false,
          "deviceListStrategy": [
            "envvar"
          ],
          "deviceIDStrategy": "uuid",
          "cdiAnnotationPrefix": "cdi.k8s.io/",
          "nvidiaCTKPath": "/usr/bin/nvidia-ctk",
          "containerDriverRoot": "/driver-root"
        }
      },
      "resources": {
        "gpus": [
          {
            "pattern": "*",
            "name": "nvidia.com/gpu"
          }
        ]
      },
      "sharing": {
        "timeSlicing": {}
      },
      "imex": {}
    }
    I0208 22:08:40.696086      1 main.go:356] Retrieving plugins.
    I0208 22:08:40.719070      1 server.go:195] Starting GRPC server for 'nvidia.com/gpu'
    I0208 22:08:40.721000      1 server.go:139] Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
    I0208 22:08:40.723731      1 server.go:146] Registered device plugin for 'nvidia.com/gpu' with Kubelet

    * Following these instructions, I confirmed that NVIDIA GPUs can be requested by a container:

    Code:
    # Intentionally _not_ using the runtimeClassName to show that this fails:
    $ cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: gpu-pod
    spec:
      restartPolicy: Never
      # runtimeClassName: nvidia
      containers:
        - name: cuda-container
          image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
          resources:
            limits:
              nvidia.com/gpu: 1
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
    EOF

    $ kubectl logs gpu-pod
    Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
    [Vector addition of 50000 elements]

    # Now run the same `apply` but with the addition of `runtimeClassName: nvidia`

    $ kubectl logs gpu-pod
    [Vector addition of 50000 elements]
    Copy input data from the host memory to the CUDA device
    CUDA kernel launch with 196 blocks of 256 threads
    Copy output data from the CUDA device to the host memory
    Test PASSED
    Done

    * I've set runtimeClassName=nvidia for the Jellyfin pod, and explicitly set the environment variables from here:

    Code:
    $ kubectl -n jellyfin describe pod jellyfin-58b9dddd5d-5vh9c
    ...
    Runtime Class Name:  nvidia
    ...
    Containers:
      jellyfin:
        ...
        Environment:
          NVIDIA_DRIVER_CAPABILITIES:  all
          NVIDIA_VISIBLE_DEVICES:      all
    ...

    * nvidia-smi on the Jellyfin pod shows the video card:

    Code:
    $ kubectl exec -it -n jellyfin jellyfin-58b9dddd5d-5vh9c -- nvidia-smi
    Sat Feb  8 23:33:49 2025     
    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI 535.216.01            Driver Version: 535.216.01  CUDA Version: 12.2    |
    |-----------------------------------------+----------------------+----------------------+
    | GPU  Name                Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf          Pwr:Usage/Cap |        Memory-Usage | GPU-Util  Compute M. |
    |                                        |                      |              MIG M. |
    |=========================================+======================+======================|
    |  0  Quadro P1000                  On  | 00000000:05:00.0 Off |                  N/A |
    | 34%  28C    P8              N/A /  N/A |      4MiB /  4096MiB |      0%      Default |
    |                                        |                      |                  N/A |
    +-----------------------------------------+----------------------+----------------------+
                                                                                           
    +---------------------------------------------------------------------------------------+
    | Processes:                                                                            |
    |  GPU  GI  CI        PID  Type  Process name                            GPU Memory |
    |        ID  ID                                                            Usage      |
    |=======================================================================================|
    |  No running processes found                                                          |
    +---------------------------------------------------------------------------------------+

    * I've enabled Nvidia NVENC Transcoding in Jellyfin Playback Settings - enabling H264, MPEG2, VC1, VP9, and VP9 10 bit codecs (s per compatibility specs for the Quaddro P1000 here), and enabling "Enable enhanced NVDEC decoder" and "Enable hardware encoding".

    And yet, when I "Play a video in the Jellyfin web client and trigger a video transcoding by setting a lower resolution or bitrate.", or encounter hanging playback on a Kodi client, nvidia-smi on the Jellyfin Pod shows no .../ffmpeg processes, and the hanging playback remains.

    Any ideas on things I might have missed with Hardware Acceleration - or, for addressing hanging playback in general?
    TheDreadPirate
    Offline

    Community Moderator

    Posts: 15,375
    Threads: 10
    Joined: 2023 Jun
    Reputation: 460
    Country:United States
    #2
    2025-02-10, 01:17 PM
    Kodi will almost always direct play the video, meaning no ffmpeg process will spawn. ffmpeg only starts when transcoding is needed.

    Can you describe your networking setup? And whether you're using a reverse proxy? What you're describing sounds more like network related buffering.

    Can you share your full jellyfin log via privatebin.net?
    Jellyfin 10.10.7 (Docker)
    Ubuntu 24.04.2 LTS w/HWE
    Intel i3 12100
    Intel Arc A380
    OS drive - SK Hynix P41 1TB
    Storage
        4x WD Red Pro 6TB CMR in RAIDZ1
    [Image: GitHub%20Sponsors-grey?logo=github]
    « Next Oldest | Next Newest »

    Users browsing this thread: 1 Guest(s)


    • View a Printable Version
    • Subscribe to this thread
    Forum Jump:

    Home · Team · Help · Contact
    © Designed by D&D - Powered by MyBB
    L


    Jellyfin

    The Free Software Media System

    Linear Mode
    Threaded Mode