CUDA updates - Printable Version +- Jellyfin Forum (https://forum.jellyfin.org) +-- Forum: Support (https://forum.jellyfin.org/f-support) +--- Forum: General Questions (https://forum.jellyfin.org/f-general-questions) +--- Thread: CUDA updates (/t-cuda-updates) |
CUDA updates - k5rqo - 2024-06-23 Hi, I know this is not fully related to jellyfin but i don't know where else i'd ask so i'm asking my question here. I have jellyfin running in a docker container using the official docker image, i am passing through my nvidia tesla gpu as described in the jellyfin documentation for gpu passthrough. I have the correct drivers and nvidia-container-toolkit installed on my host (debian bookworm). This works fine most of the time, but sometimes, ffmpeg fails saying there is no cuda device available. I have attributed this to the drivers being updated on the host by unattended-upgrades, but whenever i get the ffmpeg error, i can't find any logs of any nvidia component being updated. Am i missing something here? RE: CUDA updates - TheDreadPirate - 2024-06-23 I remember another user had this problem months ago. I don't recall what the solution was, if one was even found. And I can't find the thread at the moment. RE: CUDA updates - pcm - 2024-06-24 I'd start at syslog and dmesg in the container to see what's going on when the error happens. If there's nothing in the container's syslogs/dmesg then i'd check host's dmesg .Another thing you could do is enable nvlog .Quote: I have attributed this to the drivers being updated on the host by unattended-upgrades, but whenever i get the ffmpeg error, i can't find any logs of any nvidia component being updated. IMHO unattended upgrade should not cause such behavior (atleast not for me and I am way behind on my upgrade for my gpu)... It could be an actual hardware issue (with your specific GPU) or could be a bug with your specific GPU device driver (either in the passthru module or somewhere else)... RE: CUDA updates - k5rqo - 2024-06-24 (2024-06-24, 04:29 PM)pcm Wrote: I'd start atI don't think the container allows this, as it's good practice to lock containers down as much as possible. (2024-06-24, 04:29 PM)pcm Wrote: Another thing you could do is enable I can't find anything about this online, could you explain a bit more?(2024-06-24, 04:29 PM)pcm Wrote: IMHO unattended upgrade should not cause such behavior (atleast not for me and I am way behind on my upgrade for my gpu)... It could be an actual hardware issue (with your specific GPU) or could be a bug with your specific GPU device driver (either in the passthru module or somewhere else)...I do actually think this could be caused by a driver upgrade, the container has a loaded library that communicates with the docker passed through device, if the host driver suddenly changes, the library can't communicate with the gpu anymore as it suddenly uses a mismatched driver. RE: CUDA updates - pcm - 2024-06-24 Now that you mention it, that does make sense. But, wouldn't restarting the container the image fix the issue ? containers are meant to be ephemeral anyways... Does the host machine capture any dmesg logs ? It's nvidia-debugdump command. I just had an alias setup... mybad.
RE: CUDA updates - k5rqo - 2024-06-24 (2024-06-24, 08:16 PM)pcm Wrote: Now that you mention it, that does make sense.Yes that does fix it, but my problem is that i wanna know what causes the sudden driver update. :) (2024-06-24, 08:16 PM)pcm Wrote: Does the host machine capture any dmesg logs ?I'll try to spot something next time it occurs. (2024-06-24, 08:16 PM)pcm Wrote: It's All good, i'll try that too.
RE: CUDA updates - CleverId10t - 2024-06-26 I have experienced this, and turning off auto updates "fixed" it (as did a reboot of the docker host). As I had a simple solution (turning off auto update), I didn't bother investigating further. RE: CUDA updates - k5rqo - 2024-06-27 (2024-06-26, 09:31 PM)CleverId10t Wrote: I have experienced this, and turning off auto updates "fixed" it (as did a reboot of the docker host). What method did you use for auto updates? RE: CUDA updates - k5rqo - 2024-06-30 I just encountered the issue again, it seems i wasn't able to find previous auto installations of nvidia related packages, because the unattended-upgrade log would be overwritten each time unattended-upgrade ran. I will now blacklist these packages from auto updating by doing the following: Code: #/etc/apt/apt.conf.d/50unattended-upgrades Even if one of you would like to keep nvidia auto updated, it won't work nicely with unattended-upgrade (it's nvidia after all). I think the best solution for everyone is manually updating them once in a while. |