• Login
  • Register
  • Login Register
    Login
    Username/Email:
    Password:
    Or login with a social network below
  • Forum
  • Website
  • GitHub
  • Status
  • Translation
  • Features
  • Team
  • Rules
  • Help
  • Feeds
User Links
  • Login
  • Register
  • Login Register
    Login
    Username/Email:
    Password:
    Or login with a social network below

    Useful Links Forum Website GitHub Status Translation Features Team Rules Help Feeds
    Jellyfin Forum Support General Questions CUDA updates

     
    • 0 Vote(s) - 0 Average

    CUDA updates

    k5rqo
    Offline

    Junior Member

    Posts: 7
    Threads: 2
    Joined: 2024 Apr
    Reputation: 0
    #1
    2024-06-23, 08:41 PM
    Hi, I know this is not fully related to jellyfin but i don't know where else i'd ask so i'm asking my question here.

    I have jellyfin running in a docker container using the official docker image, i am passing through my nvidia tesla gpu as described in the jellyfin documentation for gpu passthrough. I have the correct drivers and nvidia-container-toolkit installed on my host (debian bookworm).

    This works fine most of the time, but sometimes, ffmpeg fails saying there is no cuda device available. I have attributed this to the drivers being updated on the host by unattended-upgrades, but whenever i get the ffmpeg error, i can't find any logs of any nvidia component being updated.

    Am i missing something here?
    TheDreadPirate
    Offline

    Community Moderator

    Posts: 15,374
    Threads: 10
    Joined: 2023 Jun
    Reputation: 460
    Country:United States
    #2
    2024-06-23, 10:52 PM
    I remember another user had this problem months ago. I don't recall what the solution was, if one was even found. And I can't find the thread at the moment.
    Jellyfin 10.10.7 (Docker)
    Ubuntu 24.04.2 LTS w/HWE
    Intel i3 12100
    Intel Arc A380
    OS drive - SK Hynix P41 1TB
    Storage
        4x WD Red Pro 6TB CMR in RAIDZ1
    [Image: GitHub%20Sponsors-grey?logo=github]
    pcm
    Offline

    Member

    Posts: 62
    Threads: 4
    Joined: 2024 May
    Reputation: 0
    Country:Uzbekistan
    #3
    2024-06-24, 04:29 PM (This post was last modified: 2024-06-24, 04:35 PM by pcm. Edited 2 times in total.)
    I'd start at syslog and dmesg in the container to see what's going on when the error happens. If there's nothing in the container's syslogs/dmesg then i'd check host's dmesg.

    Another thing you could do is enable nvlog.

    Quote: I have attributed this to the drivers being updated on the host by unattended-upgrades, but whenever i get the ffmpeg error, i can't find any logs of any nvidia component being updated.

    IMHO unattended upgrade should not cause such behavior (atleast not for me and I am way behind on my upgrade for my gpu)... It could be an actual hardware issue (with your specific GPU) or could be a bug with your specific GPU device driver (either in the passthru module or somewhere else)...
    k5rqo
    Offline

    Junior Member

    Posts: 7
    Threads: 2
    Joined: 2024 Apr
    Reputation: 0
    #4
    2024-06-24, 07:06 PM
    (2024-06-24, 04:29 PM)pcm Wrote: I'd start at syslog and dmesg in the container to see what's going on when the error happens. If there's nothing in the container's syslogs/dmesg then i'd check host's dmesg.
    I don't think the container allows this, as it's good practice to lock containers down as much as possible.

    (2024-06-24, 04:29 PM)pcm Wrote: Another thing you could do is enable nvlog.
    I can't find anything about this online, could you explain a bit more?

    (2024-06-24, 04:29 PM)pcm Wrote: IMHO unattended upgrade should not cause such behavior (atleast not for me and I am way behind on my upgrade for my gpu)... It could be an actual hardware issue (with your specific GPU) or could be a bug with your specific GPU device driver (either in the passthru module or somewhere else)...
    I do actually think this could be caused by a driver upgrade, the container has a loaded library that communicates with the docker passed through device, if the host driver suddenly changes, the library can't communicate with the gpu anymore as it suddenly uses a mismatched driver.
    pcm
    Offline

    Member

    Posts: 62
    Threads: 4
    Joined: 2024 May
    Reputation: 0
    Country:Uzbekistan
    #5
    2024-06-24, 08:16 PM (This post was last modified: 2024-06-24, 08:25 PM by pcm. Edited 1 time in total.)
    Now that you mention it, that does make sense.
    But, wouldn't restarting the container the image fix the issue ? containers are meant to be ephemeral anyways...
    Does the host machine capture any dmesg logs ?

    It's nvidia-debugdump command. I just had an alias setup... mybad.
    k5rqo
    Offline

    Junior Member

    Posts: 7
    Threads: 2
    Joined: 2024 Apr
    Reputation: 0
    #6
    2024-06-24, 09:25 PM
    (2024-06-24, 08:16 PM)pcm Wrote: Now that you mention it, that does make sense.
    But, wouldn't restarting the container the image fix the issue ? containers are meant to be ephemeral anyways...
    Yes that does fix it, but my problem is that i wanna know what causes the sudden driver update. :)

    (2024-06-24, 08:16 PM)pcm Wrote: Does the host machine capture any dmesg logs ?
    I'll try to spot something next time it occurs.

    (2024-06-24, 08:16 PM)pcm Wrote: It's nvidia-debugdump command. I just had an alias setup... mybad.
    All good, i'll try that too.
    CleverId10t
    Offline

    Junior Member

    Posts: 22
    Threads: 4
    Joined: 2023 Jun
    Reputation: 0
    #7
    2024-06-26, 09:31 PM
    I have experienced this, and turning off auto updates "fixed" it (as did a reboot of the docker host).

    As I had a simple solution (turning off auto update), I didn't bother investigating further.
    k5rqo
    Offline

    Junior Member

    Posts: 7
    Threads: 2
    Joined: 2024 Apr
    Reputation: 0
    #8
    2024-06-27, 08:19 AM
    (2024-06-26, 09:31 PM)CleverId10t Wrote: I have experienced this, and turning off auto updates "fixed" it (as did a reboot of the docker host).

    What method did you use for auto updates?
    k5rqo
    Offline

    Junior Member

    Posts: 7
    Threads: 2
    Joined: 2024 Apr
    Reputation: 0
    #9
    2024-06-30, 06:51 AM
    I just encountered the issue again, it seems i wasn't able to find previous auto installations of nvidia related packages, because the unattended-upgrade log would be overwritten each time unattended-upgrade ran. I will now blacklist these packages from auto updating by doing the following:

    Code:
    #/etc/apt/apt.conf.d/50unattended-upgrades
    Unattended-Upgrade::Package-Blacklist {
           ".*nvidia.*"
    }

    Even if one of you would like to keep nvidia auto updated, it won't work nicely with unattended-upgrade (it's nvidia after all). I think the best solution for everyone is manually updating them once in a while.
    1
    « Next Oldest | Next Newest »

    Users browsing this thread: 1 Guest(s)


    • View a Printable Version
    • Subscribe to this thread
    Forum Jump:

    Home · Team · Help · Contact
    © Designed by D&D - Powered by MyBB
    L


    Jellyfin

    The Free Software Media System

    Linear Mode
    Threaded Mode