Docker - lost nvidia/cuda after power outage - Printable Version +- Jellyfin Forum (https://forum.jellyfin.org) +-- Forum: Support (https://forum.jellyfin.org/f-support) +--- Forum: Troubleshooting (https://forum.jellyfin.org/f-troubleshooting) +--- Thread: Docker - lost nvidia/cuda after power outage (/t-docker-lost-nvidia-cuda-after-power-outage) |
Docker - lost nvidia/cuda after power outage - turbochamp - 2024-09-28 My working docker container is suddenly throwing cuda errors after a power outage. Nothing else has changed, worked for months till this damn storm. nvidia-container-toolkit is installed, any thoughts? Quote:[AVHWDeviceContext @ 0x64df00579a40] Cannot load libcuda.so.1 And here's my docker-compose: Quote:services: RE: Docker - lost nvidia/cuda after power outage - TheDreadPirate - 2024-09-28 What is the output of nvidia-smi in the container? Code: docker exec -it jellyfin nvidia-smi RE: Docker - lost nvidia/cuda after power outage - turbochamp - 2024-09-28 (2024-09-28, 03:14 AM)TheDreadPirate Wrote: What is the output of nvidia-smi in the container? Thanks for quick reply, here is the output: Code: OCI runtime exec failed: exec failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown RE: Docker - lost nvidia/cuda after power outage - TheDreadPirate - 2024-09-28 Try reinstalling the nvidia container toolkit. RE: Docker - lost nvidia/cuda after power outage - turbochamp - 2024-09-28 (2024-09-28, 03:53 AM)TheDreadPirate Wrote: Try reinstalling the nvidia container toolkit. Reinstalled nvidia-container-toolkit and restarted docker (service). However same output/issue. RE: Docker - lost nvidia/cuda after power outage - TheDreadPirate - 2024-09-28 Code: sudo apt list --installed | egrep -i "nvidia|libnv|cuda" RE: Docker - lost nvidia/cuda after power outage - turbochamp - 2024-09-28 (2024-09-28, 04:27 AM)TheDreadPirate Wrote: I am on Arch, but with Code: pacman -Qi nvidia libnv cuda Quote:pacman -Qi nvidia libnv cuda RE: Docker - lost nvidia/cuda after power outage - crobibero - 2024-09-28 Try updating your docker-compose to specify nvidia devices. Formatting may be off since I pasted from my phone Code: deploy: RE: Docker - lost nvidia/cuda after power outage - turbochamp - 2024-09-28 (2024-09-28, 09:43 AM)crobibero Wrote: Try updating your docker-compose to specify nvidia devices. I get this error when restarting docker. Line 30 is the "count: all" line. Here's my changes: Code: yaml: line 30: mapping values are not allowed in this context Code: deploy: RE: Docker - lost nvidia/cuda after power outage - turbochamp - 2024-09-28 I guess my formatting was off. It now appears to be fixed, it works! Here's with the right formatting: Code: deploy: Then restarted docker-compose down, restarted docker service and docker-compose up |