![]() |
Thread Pool Starvation - Printable Version +- Jellyfin Forum (https://forum.jellyfin.org) +-- Forum: Support (https://forum.jellyfin.org/f-support) +--- Forum: Troubleshooting (https://forum.jellyfin.org/f-troubleshooting) +--- Thread: Thread Pool Starvation (/t-thread-pool-starvation) |
Thread Pool Starvation - natzilla - 2023-09-04 I am facing a situation where my system is being flooded with processes from /usr/bin/jellyfin There are hundreds, possible thousands of these entries in htop. It's obvious something is hung here and I'd like to know some ways to further investigate it. I have not rebooted the server which in my experience does clear it, but I want to root cause this first. More confirmation details regarding CPU usage being starved. ● jellyfin.service - Jellyfin Media Server Loaded: loaded (/lib/systemd/system/jellyfin.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/jellyfin.service.d └─jellyfin.service.conf Active: active (running) since Fri 2023-09-01 19:29:02 UTC; 2 days ago Main PID: 732 (jellyfin) Tasks: 3636 (limit: 18546) Memory: 13.7G CPU: 2d 19h 15min 58.966s CGroup: /system.slice/jellyfin.service └─732 /usr/bin/jellyfin --webdir=/usr/share/jellyfin/web --restartpath=/usr/lib/jellyfin/restart.sh --ffmpeg=/usr/lib/jellyfin-ffmpeg/ffmpeg Sep 04 14:59:29 jellyfin jellyfin[732]: [14:59:29] [WRN] As of "09/04/2023 14:59:09 +00:00", the heartbeat has been running for "00:00:20.7576155" which is longer than "00:00:01". This could be caused by thread pool starvation. Sep 04 14:59:50 jellyfin jellyfin[732]: [14:59:50] [WRN] As of "09/04/2023 14:59:31 +00:00", the heartbeat has been running for "00:00:10.8371778" which is longer than "00:00:01". This could be caused by thread pool starvation. Sep 04 15:00:13 jellyfin jellyfin[732]: [15:00:13] [WRN] As of "09/04/2023 14:59:51 +00:00", the heartbeat has been running for "00:00:21.2132112" which is longer than "00:00:01". This could be caused by thread pool starvation. Sep 04 15:00:42 jellyfin jellyfin[732]: [15:00:42] [WRN] As of "09/04/2023 15:00:22 +00:00", the heartbeat has been running for "00:00:20.3549039" which is longer than "00:00:01". This could be caused by thread pool starvation. Sep 04 15:00:53 jellyfin jellyfin[732]: [15:00:53] [WRN] As of "09/04/2023 15:00:43 +00:00", the heartbeat has been running for "00:00:10.0880467" which is longer than "00:00:01". This could be caused by thread pool starvation. Sep 04 15:01:16 jellyfin jellyfin[732]: [15:01:16] [WRN] As of "09/04/2023 15:00:55 +00:00", the heartbeat has been running for "00:00:21.4189816" which is longer than "00:00:01". This could be caused by thread pool starvation. Sep 04 15:01:25 jellyfin jellyfin[732]: [15:01:25] [WRN] As of "09/04/2023 15:01:18 +00:00", the heartbeat has been running for "00:00:07.2229611" which is longer than "00:00:01". This could be caused by thread pool starvation. Sep 04 15:01:37 jellyfin jellyfin[732]: [15:01:37] [WRN] As of "09/04/2023 15:01:26 +00:00", the heartbeat has been running for "00:00:10.6905181" which is longer than "00:00:01". This could be caused by thread pool starvation. Sep 04 15:01:39 jellyfin jellyfin[732]: [15:01:39] [WRN] As of "09/04/2023 15:01:38 +00:00", the heartbeat has been running for "00:00:01.5668541" which is longer than "00:00:01". This could be caused by thread pool starvation. Sep 04 15:01:43 jellyfin jellyfin[732]: [15:01:43] [WRN] As of "09/04/2023 15:01:41 +00:00", the heartbeat has been running for "00:00:02.4147117" which is longer than "00:00:01". This could be caused by thread pool starvation. System details No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 RE: Thread Pool Starvation - TheDreadPirate - 2023-09-04 Can you describe your setup? Number of users, GPU used for transcoding, storage for the VM/container, is this storage local or remote? Some local, some remote? RE: Thread Pool Starvation - natzilla - 2023-09-04 (2023-09-04, 03:27 PM)TheDreadPirate Wrote: Can you describe your setup? Number of users, GPU used for transcoding, storage for the VM/container, is this storage local or remote? Some local, some remote? Day to day the number of active users could be 4-6 but mostly around 2-3 sometimes. GPU is a Quadro P400 - Not everything needs to transcode, but I did install the patch for unlocking the limit a while ago. Storage for this VM is 200GB for the system, media storage is a local NFS share RE: Thread Pool Starvation - Venson - 2023-09-04 Although i cannot put my thumb on it but there seems to be something fundamentally wrong with this setup. I see some ffmpeg processes crashing for no apparent reason, lots of network issues with corrupt packages, Plackback tracker not being cleaned up and more. Also chapter extractions being aborted. I dont think its actually JFs issue but you really somehow started tons of JF instances. RE: Thread Pool Starvation - natzilla - 2023-09-04 (2023-09-04, 03:32 PM)Venson Wrote: Although i cannot put my thumb on it but there seems to be something fundamentally wrong with this setup. I see some ffmpeg processes crashing for no apparent reason, lots of network issues with corrupt packages, Plackback tracker not being cleaned up and more. Also chapter extractions being aborted. Your comment made me think it might be requests coming from my reverse proxy but I paused that container and it had no effect. I am watching the cpu counter lower than shoot back up so you are right. RE: Thread Pool Starvation - TheDreadPirate - 2023-09-04 (2023-09-04, 03:30 PM)natzilla Wrote: Storage for this VM is 200GB for the system Can you get more specific about the 200GB VM storage? What I'm trying to get at is whether the storage is local and what file system. All of the problems here and what Venson mentioned tell me that there is an issue with disk I/O and throughput. How many VMs are you running on this machine? RE: Thread Pool Starvation - natzilla - 2023-09-04 (2023-09-04, 04:08 PM)TheDreadPirate Wrote:(2023-09-04, 03:30 PM)natzilla Wrote: Storage for this VM is 200GB for the system Sure, Jellyfin's drive is currently the only VM running on this specific drive in my hypervisor. I have other disks for other VM's but kept jellyfin on it's own. It's a Samsung 870 EVO for jellyfin. It's a total 500GB capacity but limited it to 200GB Storage is fully local to the hypervisor and it should be ext4 with the client, and the vm disk is raw RE: Thread Pool Starvation - natzilla - 2023-09-04 The system appears to have calmed down now. I didn't do anything to it at all so I am at a loss. I checked the scheduled tasks page for anything that was running and it was all hours ago and taking less than a minute. I am at a loss. Edit: take that back, the issue returned RE: Thread Pool Starvation - natzilla - 2023-10-01 This still appears to be a problem after 10.8.11 update. Still very random, and I'm not sure whats causing it. RE: Thread Pool Starvation - pcm - 2024-06-05 I'm wondering if /usr/lib/jellyfin/restart.sh has something to do with it. I'm taking a wild stab in the dark, but I'm thinking that jellyfin process is somehow thinking it not healthy and keeps trying to restart using the restart.sh script. Someone familiar with how --restartpath flag works might be able to weigh in better. In the meantime could you provide the last few lines of journalctl ? Code: journalctl -u jellyfin -n 200 --no-pager |