Jellyfin Forum
SOLVED: Subtitle extraction - Printable Version

+- Jellyfin Forum (https://forum.jellyfin.org)
+-- Forum: Support (https://forum.jellyfin.org/f-support)
+--- Forum: Troubleshooting (https://forum.jellyfin.org/f-troubleshooting)
+--- Thread: SOLVED: Subtitle extraction (/t-solved-subtitle-extraction)

Pages: 1 2


RE: Subtitle extraction - wolfrumble4398 - 2024-05-04

(2024-05-03, 12:35 PM)TheDreadPirate Wrote: Yes and yes.  Though if deleted in file explorer Jellyfin won't delete the subtitles until the next scan or if real time monitoring is enabled.

Thanks for the clarification!


RE: Subtitle extraction - seseau1 - 2024-05-15

Hi, very sorry for necroing this thread!

I'm going through the same thing, and trying to figure out how to match the extracted subtitles to the actual media. As stated in the thread earlier, the extracted subs are named in a seemingly random fashion which does seem to match the MD5 ID of the file, but I don't know how to match them. In Jellyfin, extracted subtitled don't appear in the subs options even after I've extracted them.

For more context, I am trying to make use of the dual subtitles feature which Jellyfin just implemented, but it only lets you select secondary subtitles if you select external subs to begin with for the primary subs. If I select embedded subs, I am not given the option to select secondary subtitles. As a result, I am trying to extract the embedded subs so I can use them as "external" subs and trigger secondary subs.

Thanks!


RE: Subtitle extraction - TheDreadPirate - 2024-05-15

Jellyfin keeps track of the extracted subtitles in the database (library.db). Same with images and other metadata that Jellyfin manages centrally.


RE: Subtitle extraction - sjorge - 2024-08-10

(2024-05-03, 12:35 PM)TheDreadPirate Wrote: Yes and yes.  Though if deleted in file explorer Jellyfin won't delete the subtitles until the next scan or if real time monitoring is enabled.

Sadly it doesn't look like they ever get cleaned up.

Code:
Aug 09 07:32:59 fqdn jellyfin[398659]: [07:32:59] [INF] ffmpeg subtitle extraction completed for /mm/Series/Rick and Morty (2013)/Rick and Morty - S07E05 - Unmortricken.mkv to /var/lib/jellyfin/data/subtitles/0/0e230db0-1f4f-93f3-f642-8b95ce07cb12.srt
...
Aug 10 14:43:23 fqdn jellyfin[398659]: [14:43:23] [INF] Removing item, Type: Series, Name: Rick and Morty, Path: /mm/Series/Rick and Morty (2013), Id: 2c6787f7-72aa-ef8d-588a-2ac214b2d1a0

One would expect the subtitles to be removed in this case. But it's still there.

Code:
root@node:~# ls -l /var/lib/jellyfin/data/subtitles/0/0e230db0-1f4f-93f3-f642-8b95ce07cb12.srt
-rw-r--r-- 1 jellyfin jellyfin 22685 Aug  9 07:32 /var/lib/jellyfin/data/subtitles/0/0e230db0-1f4f-93f3-f642-8b95ce07cb12.srt

Even after 2 additional 'Scan All Libraries' and the scheduled tasks Subtitle Extract (would not expect this to clean it up looking at the code) and Clean Cache Directory (Most likely candidated) have run.

I can also not find any reference in the code to where this gets cleaned up, just the Subtitle Extract task will call the function to extract subtitles, which will check they exist already first. And playback of an item requiring subs will also trigger the same call. But there is no delete happening everywhere, as far as I can see, there is also no reference to this in the database. I would have perhaps expected library.db -> mediastreams to have been updated with a complete Path field pointing to the cache, but this is also not the case.

If I do every find a way to cleanup old files here, I'll edit this post but for now it seems to be very hard to do. Aside from nuking everything in that dir and running Subtitle Extract again which will then obviously not extract subtitles for the media that got removed.

Edit: writing a script to do it is not going to be easy Slightly-frowning-face I tried something in python and typescript (the two languages I know but their date functions lack the required persision that C# has)

Edit 2:

Very much as-is as this is the first C# code I have written in over 10 years ...
https://gist.github.com/sjorge/db3661f5d3a79349a380da6cdc85eb4e

I made sure subtitle extract ran, dropped a .ignore, ran a scan all libraries.
Then ran the little program above, I'm not very happy with the code quality but it gets the job done.

Code:
sjorge@node:~$ sudo -u jellyfin /var/lib/jellyfin/.cron/jf_subtitle_cache_cleaner
Opening /var/lib/jellyfin/data/library.db ...
Looking up subtitle mediastreams ...
Detected 35085 valid subtitle cache paths.
Subtitle cache: purged=88, kept=35085

I guess this could be cleaned up and turned into a seperate scheduled task... not sure that would be acceptable though.

Edit 3: I filed an issue https://github.com/jellyfin/jellyfin-plugin-subtitleextract/issues/35