2024-01-04, 09:59 PM
I'm running Jellyfin on a proxmox server in a Debian 11 LXC via docker-compose. This has been humming along well without any issues and then suddenly after I've run a routine proxmox update, transcoding is broken. From what I can tell passthrough is working fine, as I can run
Any hw transcode errors with the following ffmpeg output (final four lines):
This happens with any transcode, but the above example is for transcoding an x265 HEVC mkv file
This is Jellyfin Version: 10.8.13, here's the docker-compose:
Config in
Given everything was working fine before the update (though I allow this could have been caused by many other adjacent things, such as a pull of the docker image which has also happened in the meantime), I've duly consulted the interwebs. There's a lot of noise about these nvidia drivers and quite a lot of conflicting information. I gather there may be a bug w/ jellyfin losing track of nvidia hw device because it has changed names:
https://github.com/jellyfin/jellyfin/issues/9177
There's a very remote possibility that I'm hitting (arbitrary) limits of concurrent hw encoding, as I haven't bothered to install the patched nvidia driver yet (we rarely run more than 2 streaming sessions concurrently):
https://github.com/keylase/nvidia-patch
I also gather there may be some codec issues, but have struggled to follow exactly what the issues there are:
https://github.com/Artiume/jellyfin-docs...ki/main.md
So now I turn to this fine community to see if other users have experienced something similar lately and what steps worked for you.
nvidia-smi
on proxmox host, debian 11 lxc and inside the docker container:
root@server# nvidia-smi
Thu Jan 4 21:45:26 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Quadro P600 Off | 00000000:02:00.0 Off | N/A |
| 24% 40C P0 N/A / N/A | 0MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
root@server#
Any hw transcode errors with the following ffmpeg output (final four lines):
[AVHWDeviceContext @ 0x55cde7791d40] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
Device creation failed: -542398533.
Failed to set value 'cuda=cu:0' for option 'init_hw_device': Generic error in an external library
Error parsing global options: Generic error in an external library
This happens with any transcode, but the above example is for transcoding an x265 HEVC mkv file
/usr/lib/jellyfin-ffmpeg/ffmpeg -analyzeduration 200M -init_hw_device cuda=cu:0 -filter_hw_device cu -hwaccel cuda -hwaccel_output_format cuda -threads 1 -autorotate 0 -i file:"/data/locationredacted/filenameredacted.mkv" -autoscale 0 -map_metadata -1 -map_chapters -1 -threads 8 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_nvenc -preset p1 -b:v 6707950 -maxrate 6707950 -bufsize 13415900 -g:v:0 75 -keyint_min:v:0 75 -vf "setparams=color_primaries=bt709:color_trc=bt709:colorspace=bt709,scale_cuda=format=yuv420p" -codec:a:0 libfdk_aac -ac 2 -ab 192000 -ar 48000 -copyts -avoid_negative_ts disabled -max_muxing_queue_size 2048 -f hls -max_delay 5000000 -hls_time 3 -hls_segment_type mpegts -start_number 0 -hls_segment_filename "/config/data/transcodes/6ada95c6cc43163d6d99479c3195bfa3%d.ts" -hls_playlist_type vod -hls_list_size 0 -y "/config/data/transcodes/6ada95c6cc43163d6d99479c3195bfa3.m3u8"
ffmpeg version 5.1.4-Jellyfin Copyright © 2000-2023 the FFmpeg developers
built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04)
configuration: --prefix=/usr/lib/jellyfin-ffmpeg --target-os=linux --extra-libs=-lfftw3f --extra-version=Jellyfin --disable-doc --disable-ffplay --disable-ptx-compression --disable-static --disable-libxcb --disable-sdl2 --disable-xlib --enable-lto --enable-gpl --enable-version3 --enable-shared --enable-gmp --enable-gnutls --enable-chromaprint --enable-libdrm --enable-libass --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libdav1d --enable-libwebp --enable-libvpx --enable-libx264 --enable-libx265 --enable-libzvbi --enable-libzimg --enable-libfdk-aac --arch=amd64 --enable-libsvtav1 --enable-libshaderc --enable-libplacebo --enable-vulkan --enable-opencl --enable-vaapi --enable-amf --enable-libmfx --enable-ffnvcodec --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvdec --enable-nvenc
libavutil 57. 28.100 / 57. 28.100
libavcodec 59. 37.100 / 59. 37.100
libavformat 59. 27.100 / 59. 27.100
libavdevice 59. 7.100 / 59. 7.100
libavfilter 8. 44.100 / 8. 44.100
libswscale 6. 7.100 / 6. 7.100
libswresample 4. 7.100 / 4. 7.100
libpostproc 56. 6.100 / 56. 6.100
This is Jellyfin Version: 10.8.13, here's the docker-compose:
version: '3'
services:
jellyfin:
image: lscr.io/linuxserver/jellyfin
container_name: jellyfin
network_mode: 'host'
devices:
- /dev/nvidia-uvm
- /dev/nvidia-uvm-tools
- /dev/nvidia-modeset
- /dev/nvidiactl
- /dev/nvidia0
# - /dev/nvidia-caps
restart: unless-stopped
environment:
- PUID=1001
- PGID=1001
- NVIDIA_DRIVER_CAPABILITIES=compute,video,utility
- NVIDIA_VISIBLE_DEVICES=all
- TZ=Europe/London
- UMASK_SET=022 #optional
volumes:
- /opt/appdata/jellyfinconfig
- /mnt/storage/jellyfin_transcodesconfig/data/transcodes
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Config in
/etc/pve/lxc/102.conf
includes:
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvram dev/nvram none bind,optional,create=file
Given everything was working fine before the update (though I allow this could have been caused by many other adjacent things, such as a pull of the docker image which has also happened in the meantime), I've duly consulted the interwebs. There's a lot of noise about these nvidia drivers and quite a lot of conflicting information. I gather there may be a bug w/ jellyfin losing track of nvidia hw device because it has changed names:
https://github.com/jellyfin/jellyfin/issues/9177
There's a very remote possibility that I'm hitting (arbitrary) limits of concurrent hw encoding, as I haven't bothered to install the patched nvidia driver yet (we rarely run more than 2 streaming sessions concurrently):
https://github.com/keylase/nvidia-patch
I also gather there may be some codec issues, but have struggled to follow exactly what the issues there are:
https://github.com/Artiume/jellyfin-docs...ki/main.md
So now I turn to this fine community to see if other users have experienced something similar lately and what steps worked for you.