New system boots with a380 GPU wedged and will not playback video - Printable Version +- Jellyfin Forum (https://forum.jellyfin.org) +-- Forum: Support (https://forum.jellyfin.org/f-support) +--- Forum: Troubleshooting (https://forum.jellyfin.org/f-troubleshooting) +--- Thread: New system boots with a380 GPU wedged and will not playback video (/t-new-system-boots-with-a380-gpu-wedged-and-will-not-playback-video) Pages:
1
2
|
New system boots with a380 GPU wedged and will not playback video - aj_pinner - 2024-04-29 Hi Jellyfinners, I just build a new Jellyfin media server with a dedicated a380 GPU for transcoding. I followed the guide here exactly: https://jellyfin.org/docs/general/administration/hardware-acceleration/intel. When I boot around 1 time out of 5 my GPU does not work at all. I see a Failed to initialize GPU, declaring it wedged! error in the kernel log. When this error happens, ffmpeg errors out and the jellyfin client can't playback the video. Sometime rebooting will fix the issue and Jellyfin works as expected, sometimes rebooting will still reboot with a wedged GPU. Either way I can't have a media server that only works 20% of the time. I believe the problem is with the last part of the documentation: https://jellyfin.org/docs/general/administration/hardware-acceleration/intel#configure-and-verify-lp-mode-on-linux. Has anyone successfully followed the doc and got a working system? Is it an intel driver issue or is it possibly a bad GPU? Should I try another driver, does anyone know a stable version? System info: Version: 10.8.13 from official docker image Host: Ubuntu Server 22.04 with Hardware Enablement Stack and firmware-linux-nonfree driver Kernel: Linux itx 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Graphics card: ASRock Challenger A380 Here are some logs when the system will not playback video: The key message here is: *ERROR* GT0: Failed to initialize GPU, declaring it wedged! dmesg | grep i915: Code: [ 1.919628] i915 0000:03:00.0: vgaarb: deactivate vga console Researching this 'wedged issue', there are posts going back 2014 with users reporting the error but it is always about much older kernels and trying to get this working on iGPUs. Here are the driver version: Code: 28 -rw-r--r-- 1 root root 25716 Feb 21 09:32 icl_dmc_ver1_07.bin Jellyfin log when GPU is in the state: Code: [10:56:46] [ERR] [360] Jellyfin.Server.Middleware.ExceptionMiddleware: Error processing request. URL GET /videos/6eedea3a-5b2a-6f34-4bf5-fc38689342f6/hls1/main/0.ts. Client error when GPU is in this state: Code: The client isn't compatible with the media and the server isn't sending a compatible media format. I tried this twice, and reconfigured the entire server and got the same results the second time- 4/5 boots works and hardware transcoding appear to work as normal, I get 600fps however 20% of the time I get a unusable GPU. If I made a mistake in following the doc where would it be? Any information that can help troubleshoot would be greatly appreciated as I need to make a decision to return the GPU in 2 weeks if it is bad hardware. RE: New system boots with a380 GPU wedged and will not playback video - TheDreadPirate - 2024-04-29 I have an A380 in my server and was running 22.04 with the 6.5 HWE kernel for a while. I didn't have anything happen like what you are describing. Do you have more than 1 GPU in the system? Including an Intel iGPU? RE: New system boots with a380 GPU wedged and will not playback video - aj_pinner - 2024-04-29 Hi TheDreadPirate I was hoping to hear from you since I saw you commenting on other Intel ARC threads, thanks for replying. I have a few questions for you that might help me narrow down the issue since we have the same GPU and a similar setup. I picked the hardware based Jellyfin recommendations but I am not having much luck so far. No I only have the A380. I bought an F series processor so I only have the GPU for all video. This build is just for jellyfin. Initially it looked like 20% of the boots returned the GPU Wedged error. I wrote a startup script that would look at the kenel log and reboot if it saw the GPU wedged message but now its happening on every boot so caught in a boot loop. I realize that I may have made a config mistake, in the Intel GPU instructions here: Configure And Verify LP Mode On Linux "This also applies to the bleeding edge hardware such as 12th Gen Intel processors, ARC GPU and newer but step 2 should be skipped." So the instructions describe skipping step 2 which is adding a kernel module with this argument: Code: sudo sh -c "echo 'options i915 enable_guc=2' >> /etc/modprobe.d/i915.conf" Once I realized this, I removed the i915.conf file and ran: Code: sudo update-initramfs -u && sudo update-grub again, can you confirm that doing this would have updated the kernel again and removed the options i915 enable_guc=2 from the kernel or could this be responsible for the problems I am having? I have limited knowledge of this area. I ran the sudo apt update && sudo apt install -y firmware-linux-nonfree to install the latest driver, if I want to uninstall this driver and revert to the original driver, will hardware encoding work with this GPU? Does the HUC firmware exist int the 6.5.0-28-generic kernel or do we definately need this new driver? Another thing I was curious about is the fan on this GPU, most of the time it does not spin. Every 20-30 seconds or so it will spin for 10 seconds and then stop. If I do boot into a good state, even if it is transcoding, the fan does spins more often and longer but it still keeps stopping, is this normal? Does yours do this? I saw a post on reddit of a user who described rewiring his fan because it was doing something similar but couldn't find and more info. Also, I don't know how to get the GPU temperature on Linux 22.04. The intel_gpu_top tool does not show temperature. Is there a way to do it? Thanks. RE: New system boots with a380 GPU wedged and will not playback video - TheDreadPirate - 2024-04-29 I'm on 24.04 with a newer version of intel_gpu_top and it still doesn't report temps. I pretty much didn't have to do any of those LP steps with Arc. Just enabled Low Power encoding in Jellyfin. No issues with transcoding or tone mapping. Code: [ 4.378986] i915 0000:03:00.0: [drm] VT-d active for gfx access I can't really find anything conclusive online. Ensure your boards BIOS are up to date. Enable resizeable BAR in your BIOS. Try turning off SRIOV in your BIOS. RE: New system boots with a380 GPU wedged and will not playback video - aj_pinner - 2024-04-29 Okay so with 24.04 you wouldn't have to install the linux-firmware or the HWE kernel, I think everything should be fully supported? I just tried booting from usb as a test, the latest KDE Neon which is Ubuntu 22.04.4 so same kernel version without any of those mods and saw the GPU wedged error as well so this sort of rules out the config. I think I should try to update to 24.04 and see what happens. Did you upgrade or clean install? I hear a lot of horror stories upgrading to 22.04. In Bios, resizable bar is on, I will try to disable SRIOV. How about your fan does it spin as I described or is it more constant? RE: New system boots with a380 GPU wedged and will not playback video - TheDreadPirate - 2024-04-29 When I upgraded to 24.04 it was a clean install only because I was also upgrading the SSD (oooooold Intel 160GB SATA2 SSD to NVMe SSD in signature). Correct. 24.04 is on kernel 6.8 by default so fully supports Arc out of the box. The linux-firmware package was already installed out of the box. I have not peeked inside my case to check the GPU fan nor do I care. My server sits in my utility closet doings its thing. RE: New system boots with a380 GPU wedged and will not playback video - aj_pinner - 2024-04-29 SRIOV was already off so that wasn't it. Also, what size power supply do you have? I have a 400W which should be enough, gen12-f i5 cpu, I have not overclocked anything, no extra fans, or other peripherals, no SATA drives, stock cpu cooler. I wonder if it is enough for this ASRock a380 card. RE: New system boots with a380 GPU wedged and will not playback video - aj_pinner - 2024-04-29 This is my goal, to get the server stable enough to sit there an do it's thing without my intervention. When it boots successfully, as far as I can tell it works. The CPU seems a little higher than expected when I am transcoding for fps shows 600+ so that seems positive Unfortunately when it throws this GPU wedged error it doesn't work at all so I can't tell if its a software problem or a hardware problem. After like 50 failed boots, I just had a successful one: Code: [ 1.447596] i915 0000:03:00.0: vgaarb: deactivate vga console its very inconsistant which makes me think either a timing issue which could be software or hardware problem. RE: New system boots with a380 GPU wedged and will not playback video - TheDreadPirate - 2024-04-29 The A380 is a 75w(?) GPU and transcoding does not use that much power. Your PSU is plenty. Try reseating the GPU and power cables. Maybe it is not fully seated or something. RE: New system boots with a380 GPU wedged and will not playback video - aj_pinner - 2024-04-29 Yes my thinking as well, not PS. I will try to reseat. I did notice a difference between you kernlog and mine. On the last successful boot I see these lines: Code: [ 4.918698] mei i915.mei-gscfi.768-e2c2afa2-3817-4d19-9d95-06b16b588a5d: cannot connect So it says it cannot connect, it resets, unexpected reset, resets again then it is successful. In this state it works fully but in your log you just get the successful line at the end once of the sequence. I have always seen this behavior, it always errors twice then connects successfully on the 3rd try so I was thinking some sort of timing issue. Have you ever seen this before or did yours always connect successfully the first time? |