2024-04-29, 03:49 PM
(This post was last modified: 2024-04-29, 04:00 PM by aj_pinner. Edited 6 times in total.)
Hi Jellyfinners,
I just build a new Jellyfin media server with a dedicated a380 GPU for transcoding. I followed the guide here exactly: https://jellyfin.org/docs/general/admini...tion/intel.
When I boot around 1 time out of 5 my GPU does not work at all. I see a Failed to initialize GPU, declaring it wedged! error in the kernel log. When this error happens, ffmpeg errors out and the jellyfin client can't playback the video.
Sometime rebooting will fix the issue and Jellyfin works as expected, sometimes rebooting will still reboot with a wedged GPU. Either way I can't have a media server that only works 20% of the time. I believe the problem is with the last part of the documentation: https://jellyfin.org/docs/general/admini...e-on-linux. Has anyone successfully followed the doc and got a working system? Is it an intel driver issue or is it possibly a bad GPU? Should I try another driver, does anyone know a stable version?
System info:
Version: 10.8.13 from official docker image
Host: Ubuntu Server 22.04 with Hardware Enablement Stack and firmware-linux-nonfree driver
Kernel: Linux itx 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Graphics card: ASRock Challenger A380
Here are some logs when the system will not playback video:
The key message here is: *ERROR* GT0: Failed to initialize GPU, declaring it wedged!
dmesg | grep i915:
Researching this 'wedged issue', there are posts going back 2014 with users reporting the error but it is always about much older kernels and trying to get this working on iGPUs.
Here are the driver version:
Jellyfin log when GPU is in the state:
Client error when GPU is in this state:
I tried this twice, and reconfigured the entire server and got the same results the second time- 4/5 boots works and hardware transcoding appear to work as normal, I get 600fps however 20% of the time I get a unusable GPU. If I made a mistake in following the doc where would it be? Any information that can help troubleshoot would be greatly appreciated as I need to make a decision to return the GPU in 2 weeks if it is bad hardware.
I just build a new Jellyfin media server with a dedicated a380 GPU for transcoding. I followed the guide here exactly: https://jellyfin.org/docs/general/admini...tion/intel.
When I boot around 1 time out of 5 my GPU does not work at all. I see a Failed to initialize GPU, declaring it wedged! error in the kernel log. When this error happens, ffmpeg errors out and the jellyfin client can't playback the video.
Sometime rebooting will fix the issue and Jellyfin works as expected, sometimes rebooting will still reboot with a wedged GPU. Either way I can't have a media server that only works 20% of the time. I believe the problem is with the last part of the documentation: https://jellyfin.org/docs/general/admini...e-on-linux. Has anyone successfully followed the doc and got a working system? Is it an intel driver issue or is it possibly a bad GPU? Should I try another driver, does anyone know a stable version?
System info:
Version: 10.8.13 from official docker image
Host: Ubuntu Server 22.04 with Hardware Enablement Stack and firmware-linux-nonfree driver
Kernel: Linux itx 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Graphics card: ASRock Challenger A380
Here are some logs when the system will not playback video:
The key message here is: *ERROR* GT0: Failed to initialize GPU, declaring it wedged!
dmesg | grep i915:
Code:
[ 1.919628] i915 0000:03:00.0: vgaarb: deactivate vga console
[ 1.919672] i915 0000:03:00.0: [drm] Local memory IO size: 0x000000017c800000
[ 1.919675] i915 0000:03:00.0: [drm] Local memory available: 0x000000017c800000
[ 1.933363] i915 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[ 1.936221] i915 0000:03:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8)
[ 1.982883] i915 0000:03:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin version 70.5.1
[ 1.982887] i915 0000:03:00.0: [drm] GT0: HuC firmware i915/dg2_huc_gsc.bin version 7.10.3
[ 3.018230] i915 0000:03:00.0: [drm] GT0: GUC: load failed: status = 0x80000534, time = 1001ms, freq = 2400MHz, ret = -110
[ 3.018261] i915 0000:03:00.0: [drm] GT0: GUC: load failed: status: Reset = 0, BootROM = 0x1A, UKernel = 0x05, MIA = 0x00, Auth = 0x02
[ 3.018278] i915 0000:03:00.0: [drm] GT0: GUC: still extracting hwconfig table.
[ 3.018755] i915 0000:03:00.0: [drm] *ERROR* GT0: GuC initialization failed -ETIMEDOUT
[ 3.018765] i915 0000:03:00.0: [drm] *ERROR* GT0: Enabling uc failed (-5)
[ 3.018773] i915 0000:03:00.0: [drm] *ERROR* GT0: Failed to initialize GPU, declaring it wedged!
[ 3.025801] i915 0000:03:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by intel_gt_set_wedged_on_init+0x34/0x50 [i915]
[ 3.084559] [drm] Initialized i915 1.6.0 20201103 for 0000:03:00.0 on minor 0
[ 3.126572] fbcon: i915drmfb (fb0) is primary device
[ 3.228997] i915 0000:03:00.0: [drm] fb0: i915drmfb frame buffer device
[ 5.080356] mei_gsc i915.mei-gscfi.768: cl:host=01 me=32 fw disconnect request received
[ 5.080383] mei i915.mei-gscfi.768-e2c2afa2-3817-4d19-9d95-06b16b588a5d: cannot connect
[ 5.083341] mei_gsc i915.mei-gscfi.768: FW not ready: resetting: dev_state = 2 pxp = 0
[ 5.083404] mei_gsc i915.mei-gscfi.768: unexpected reset: dev_state = ENABLED fw status = 00000345 84670000 00000000 00000000 E0020002 00000000
[ 5.083475] mei_gsc i915.mei-gsc.768: FW not ready: resetting: dev_state = 2 pxp = 2
[ 5.083499] mei_gsc i915.mei-gsc.768: unexpected reset: dev_state = ENABLED fw status = 00000345 84670000 00000000 00000000 E0020002 00000000
[ 5.167953] snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops i915_audio_component_bind_ops [i915])
[ 5.469923] i915 0000:03:00.0: [drm] *ERROR* failed to load huc via gsc -8
[ 5.469940] mei_pxp i915.mei-gsc.768-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: failed to bind 0000:03:00.0 (ops i915_pxp_tee_component_ops [i915]): -8
[ 5.470322] mei_pxp i915.mei-gsc.768-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: adev bind failed: -8
[ 5.470776] mei_pxp i915.mei-gsc.768-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Master comp add failed -8
[ 5.470780] mei_pxp: probe of i915.mei-gsc.768-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1 failed with error -8
Researching this 'wedged issue', there are posts going back 2014 with users reporting the error but it is always about much older kernels and trying to get this working on iGPUs.
Here are the driver version:
Code:
28 -rw-r--r-- 1 root root 25716 Feb 21 09:32 icl_dmc_ver1_07.bin
28 -rw-r--r-- 1 root root 25952 Feb 21 09:32 icl_dmc_ver1_09.bin
372 -rw-r--r-- 1 root root 380096 Feb 21 09:32 icl_guc_32.0.3.bin
380 -rw-r--r-- 1 root root 385280 Feb 21 09:32 icl_guc_33.0.0.bin
320 -rw-r--r-- 1 root root 324160 Feb 21 09:32 icl_guc_49.0.1.bin
320 -rw-r--r-- 1 root root 327488 Feb 21 09:32 icl_guc_62.0.0.bin
336 -rw-r--r-- 1 root root 343360 Feb 21 09:32 icl_guc_69.0.3.bin
272 -rw-r--r-- 1 root root 274496 Feb 21 09:32 icl_guc_70.1.1.bin
488 -rw-r--r-- 1 root root 498880 Feb 21 09:32 icl_huc_9.0.0.bin
480 -rw-r--r-- 1 root root 488960 Feb 21 09:32 icl_huc_ver8_4_3238.bin
Jellyfin log when GPU is in the state:
Code:
[10:56:46] [ERR] [360] Jellyfin.Server.Middleware.ExceptionMiddleware: Error processing request. URL GET /videos/6eedea3a-5b2a-6f34-4bf5-fc38689342f6/hls1/main/0.ts.
MediaBrowser.Common.FfmpegException: FFmpeg exited with code 1
Client error when GPU is in this state:
Code:
The client isn't compatible with the media and the server isn't sending a compatible media format.
I tried this twice, and reconfigured the entire server and got the same results the second time- 4/5 boots works and hardware transcoding appear to work as normal, I get 600fps however 20% of the time I get a unusable GPU. If I made a mistake in following the doc where would it be? Any information that can help troubleshoot would be greatly appreciated as I need to make a decision to return the GPU in 2 weeks if it is bad hardware.