Hardware Encoding broken after updating proxmox - Nanometer3912 - 2024-01-04
I'm running Jellyfin on a proxmox server in a Debian 11 LXC via docker-compose. This has been humming along well without any issues and then suddenly after I've run a routine proxmox update, transcoding is broken. From what I can tell passthrough is working fine, as I can run nvidia-smi on proxmox host, debian 11 lxc and inside the docker container:
root@server# nvidia-smi
Thu Jan 4 21:45:26 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Quadro P600 Off | 00000000:02:00.0 Off | N/A |
| 24% 40C P0 N/A / N/A | 0MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
root@server#
Any hw transcode errors with the following ffmpeg output (final four lines):
[AVHWDeviceContext @ 0x55cde7791d40] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
Device creation failed: -542398533.
Failed to set value 'cuda=cu:0' for option 'init_hw_device': Generic error in an external library
Error parsing global options: Generic error in an external library
This happens with any transcode, but the above example is for transcoding an x265 HEVC mkv file
/usr/lib/jellyfin-ffmpeg/ffmpeg -analyzeduration 200M -init_hw_device cuda=cu:0 -filter_hw_device cu -hwaccel cuda -hwaccel_output_format cuda -threads 1 -autorotate 0 -i file:"/data/locationredacted/filenameredacted.mkv" -autoscale 0 -map_metadata -1 -map_chapters -1 -threads 8 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_nvenc -preset p1 -b:v 6707950 -maxrate 6707950 -bufsize 13415900 -g:v:0 75 -keyint_min:v:0 75 -vf "setparams=color_primaries=bt709:color_trc=bt709:colorspace=bt709,scale_cuda=format=yuv420p" -codec:a:0 libfdk_aac -ac 2 -ab 192000 -ar 48000 -copyts -avoid_negative_ts disabled -max_muxing_queue_size 2048 -f hls -max_delay 5000000 -hls_time 3 -hls_segment_type mpegts -start_number 0 -hls_segment_filename "/config/data/transcodes/6ada95c6cc43163d6d99479c3195bfa3%d.ts" -hls_playlist_type vod -hls_list_size 0 -y "/config/data/transcodes/6ada95c6cc43163d6d99479c3195bfa3.m3u8"
ffmpeg version 5.1.4-Jellyfin Copyright © 2000-2023 the FFmpeg developers
built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04)
configuration: --prefix=/usr/lib/jellyfin-ffmpeg --target-os=linux --extra-libs=-lfftw3f --extra-version=Jellyfin --disable-doc --disable-ffplay --disable-ptx-compression --disable-static --disable-libxcb --disable-sdl2 --disable-xlib --enable-lto --enable-gpl --enable-version3 --enable-shared --enable-gmp --enable-gnutls --enable-chromaprint --enable-libdrm --enable-libass --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libdav1d --enable-libwebp --enable-libvpx --enable-libx264 --enable-libx265 --enable-libzvbi --enable-libzimg --enable-libfdk-aac --arch=amd64 --enable-libsvtav1 --enable-libshaderc --enable-libplacebo --enable-vulkan --enable-opencl --enable-vaapi --enable-amf --enable-libmfx --enable-ffnvcodec --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvdec --enable-nvenc
libavutil 57. 28.100 / 57. 28.100
libavcodec 59. 37.100 / 59. 37.100
libavformat 59. 27.100 / 59. 27.100
libavdevice 59. 7.100 / 59. 7.100
libavfilter 8. 44.100 / 8. 44.100
libswscale 6. 7.100 / 6. 7.100
libswresample 4. 7.100 / 4. 7.100
libpostproc 56. 6.100 / 56. 6.100
This is Jellyfin Version: 10.8.13, here's the docker-compose:
version: '3'
services:
jellyfin:
image: lscr.io/linuxserver/jellyfin
container_name: jellyfin
network_mode: 'host'
devices:
- /dev/nvidia-uvm
- /dev/nvidia-uvm-tools
- /dev/nvidia-modeset
- /dev/nvidiactl
- /dev/nvidia0
# - /dev/nvidia-caps
restart: unless-stopped
environment:
- PUID=1001
- PGID=1001
- NVIDIA_DRIVER_CAPABILITIES=compute,video,utility
- NVIDIA_VISIBLE_DEVICES=all
- TZ=Europe/London
- UMASK_SET=022 #optional
volumes:
- /opt/appdata/jellyfinconfig
- /mnt/storage/jellyfin_transcodesconfig/data/transcodes
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Config in /etc/pve/lxc/102.conf includes:
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvram dev/nvram none bind,optional,create=file
Given everything was working fine before the update (though I allow this could have been caused by many other adjacent things, such as a pull of the docker image which has also happened in the meantime), I've duly consulted the interwebs. There's a lot of noise about these nvidia drivers and quite a lot of conflicting information. I gather there may be a bug w/ jellyfin losing track of nvidia hw device because it has changed names:
https://github.com/jellyfin/jellyfin/issues/9177
There's a very remote possibility that I'm hitting (arbitrary) limits of concurrent hw encoding, as I haven't bothered to install the patched nvidia driver yet (we rarely run more than 2 streaming sessions concurrently):
https://github.com/keylase/nvidia-patch
I also gather there may be some codec issues, but have struggled to follow exactly what the issues there are:
https://github.com/Artiume/jellyfin-docs/blob/master/general/wiki/main.md
So now I turn to this fine community to see if other users have experienced something similar lately and what steps worked for you.
RE: Hardware Encoding broken after updating proxmox - User 5268 - 2024-01-07
I have a similar problem. I run jellyfin as a standalone application on lxc. I have assigned my nvidia m2000 to the lxc container. It worked fine until December 2023. Now it shows an error :
ffmpeg version 6.0.1-Jellyfin Copyright © 2000-2023 the FFmpeg developers
built with gcc 12 (Debian 12.2.0-14)
configuration: --prefix=/usr/lib/jellyfin-ffmpeg --target-os=linux --extra-version=Jellyfin --disable-doc --disable-ffplay --disable-ptx-compression --disable-static --disable-libxcb --disable-sdl2 --disable-xlib --enable-lto --enable-gpl --enable-version3 --enable-shared --enable-gmp --enable-gnutls --enable-chromaprint --enable-libdrm --enable-libass --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libdav1d --enable-libwebp --enable-libvpx --enable-libx264 --enable-libx265 --enable-libzvbi --enable-libzimg --enable-libfdk-aac --arch=amd64 --enable-libsvtav1 --enable-libshaderc --enable-libplacebo --enable-vulkan --enable-opencl --enable-vaapi --enable-amf --enable-libvpl --enable-ffnvcodec --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvdec --enable-nvenc
libavutil 58. 2.100 / 58. 2.100
libavcodec 60. 3.100 / 60. 3.100
libavformat 60. 3.100 / 60. 3.100
libavdevice 60. 1.100 / 60. 1.100
libavfilter 9. 3.100 / 9. 3.100
libswscale 7. 1.100 / 7. 1.100
libswresample 4. 10.100 / 4. 10.100
libpostproc 57. 1.100 / 57. 1.100
[AVHWDeviceContext @ 0x5594b6a55bc0] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
Device creation failed: -542398533.
Failed to set value 'cuda=cu:0' for option 'init_hw_device': Generic error in an external library
Error parsing global options: Generic error in an external library
I have experimented with different methods but the error is still there. I have tried the NVIDIA drivers from the official website and with the debian repository, with cuda toolkit and without it, with jellyfin-ffmepg5 and jellyfin-ffmpeg6; nothing makes a difference and the error continues. nvidia-smi shows no transcoding going on as well. This functioned flawlessly in the past; I did not change anything except the periodic apt updates and it stopped working.
Does anyone have a solution for this?
RE: Hardware Encoding broken after updating proxmox - TheDreadPirate - 2024-01-07
Check out this walkthrough a community member wrote.
https://forum.jellyfin.org/t-proxmox-lxc-with-nvidia-transcoding-and-network-share
RE: Hardware Encoding broken after updating proxmox - jellyshield - 2024-01-11
For me it used to work with just installing the latest driver from NVIDIA in both host and lxc. I didn't even had to install libnvcuvid1 libnvidia-encode1 separately. After some kernel update jellyfin-ffmpeg5/6 is broken. I did what was mentioned and downgraded the driver to match the Debian version. I was also able to install the libnvcuvid1 libnvidia-encode1 in lxc and successfully ran nvidia-smi in both host and lxc. Still I have the same error.
Both the libraries are installed
Code: root@jellyfin:~# apt install -y libnvcuvid1 libnvidia-encode1
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libnvcuvid1 is already the newest version (525.147.05-4~deb12u1).
libnvidia-encode1 is already the newest version (525.147.05-4~deb12u1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
NVIDIA-SMI
Code: root@jellyfin:~# nvidia-smi
Thu Jan 11 00:43:19 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro M2000 On | 00000000:21:00.0 Off | N/A |
| 56% 28C P8 6W / 75W | 1MiB / 4096MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Still get the same error in Jellyfin during transcoding
Code: ffmpeg version 5.1.4-Jellyfin Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 12 (Debian 12.2.0-14)
configuration: --prefix=/usr/lib/jellyfin-ffmpeg --target-os=linux --extra-libs=-lfftw3f --extra-version=Jellyfin --disable-doc --disable-ffplay --disable-ptx-compression --disable-static --disable-libxcb --disable-sdl2 --disable-xlib --enable-lto --enable-gpl --enable-version3 --enable-shared --enable-gmp --enable-gnutls --enable-chromaprint --enable-libdrm --enable-libass --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libdav1d --enable-libwebp --enable-libvpx --enable-libx264 --enable-libx265 --enable-libzvbi --enable-libzimg --enable-libfdk-aac --arch=amd64 --enable-libsvtav1 --enable-libshaderc --enable-libplacebo --enable-vulkan --enable-opencl --enable-vaapi --enable-amf --enable-libmfx --enable-ffnvcodec --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvdec --enable-nvenc
libavutil 57. 28.100 / 57. 28.100
libavcodec 59. 37.100 / 59. 37.100
libavformat 59. 27.100 / 59. 27.100
libavdevice 59. 7.100 / 59. 7.100
libavfilter 8. 44.100 / 8. 44.100
libswscale 6. 7.100 / 6. 7.100
libswresample 4. 7.100 / 4. 7.100
libpostproc 56. 6.100 / 56. 6.100
[AVHWDeviceContext @ 0x562bf18b8d00] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
Device creation failed: -542398533.
Failed to set value 'cuda=cu:0' for option 'init_hw_device': Generic error in an external library
Error parsing global options: Generic error in an external library
RE: Hardware Encoding broken after updating proxmox - TheDreadPirate - 2024-01-11
Double check on the host that the group IDs didn't change for the nvidia devices in /dev/dri after a kernel and or driver update. Update your lxc config accordingly if they did change.
|