Cover image not extracted from EPUB files #100

godvino · 2024-02-27T19:47:39Z

Bookshelf fails to extract the cover image from some books in the EPUB format. The name of the book as well as the description and other details gets loaded correctly though.

One of the books that cause this issue is https://www.feedbooks.com/book/1421/the-adventures-of-sherlock-holmes

Going into Jellyfin's metadata folder after the book is imported, I can see a poster.jpg file created that is not an image actually.

$ cat poster.jpg 
<?xml version="1.0" encoding="UTF-8" ?>

<!DOCTYPE html PUBLIC
     "-//W3C//DTD XHTML 1.1//EN"
     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

  <head>
   <title>Cover</title>
   <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
  </head>
  
  <body>
    <div style="text-align: center; page-break-after: always;">
      <img src="images/cover.png" alt="Cover" style="height: 100%; max-width: 100%;" />
    </div>
  </body>

</html>

The text was updated successfully, but these errors were encountered:

unfedorg · 2024-04-14T11:15:39Z

I have 754 epub files under Jellyfin management and only 5 of them have cover image correctly extracted, so I have had a look on this issue and figured that there are 3 things to be improved.

Treat calibre:series_index as Double
1st thing I notice was that there are below error messages in the jellyfin server log on every metadata refresh attempt.

[2024-04-14 11:01:41.673 +08:00] [ERR] Error converting to int32
System.FormatException: Input string was not in a correct format.
   at System.Number.ThrowOverflowOrFormatException(ParsingStatus status, TypeCode type)
   at Jellyfin.Plugin.Bookshelf.Providers.OpfReader`1.ReadInt32AttributeInto(String xPath, Action`1 commitResult)

There is a place where the plugin takes calibre:series_index and convert it to Int32 but it's failing because calibre:series_index often has decimal. (e.g. "1.0")
Taking it as Double before convert to Int32 should solve for most cases.

jellyfin-plugin-bookshelf/Jellyfin.Plugin.Bookshelf/Providers/OpfReader.cs

Line 149 in 5baaa87

    
           ReadInt32AttributeInto("//opf:meta[@name='calibre:series_index']", index => book.IndexNumber = index);

Accept empty opfRootDirectory

There is a code to check opfRootDirectory and if it's empty or null, it gives up to extract image.
However it's common that image file is placed at the root. Just accept empty string would solve this issue.
This actually solved on 93% of my epub files.

jellyfin-plugin-bookshelf/Jellyfin.Plugin.Bookshelf/Providers/Epub/EpubMetadataImageProvider.cs

Lines 104 to 108 in 5baaa87

    
           var opfRootDirectory = Path.GetDirectoryName(opfFilePath); 
        
           if (string.IsNullOrEmpty(opfRootDirectory)) 
        
           { 
        
               return new DynamicImageResponse { HasImage = false }; 
        
           }

Improve xPath

Image extraction will fail to find a cover image if the epub file has an another object with "cover" id.
This can be improved by limiting only to the object with "image/" media-type.

Also I have some epub files that id for the cover image is "my-cover-image" instead of just a "cover-image".
I'm not sure if this is a common case but adding some wildcard may improve the chance of extracting the cover image.

jellyfin-plugin-bookshelf/Jellyfin.Plugin.Bookshelf/Providers/OpfReader.cs

Lines 57 to 67 in 5baaa87

    
           var coverId = ReadEpubCoverInto(opfRootDirectory, "//opf:item[@id='cover']"); 
        
           if (coverId is not null) 
        
           { 
        
               return coverId; 
        
           } 
        
           var coverImageId = ReadEpubCoverInto(opfRootDirectory, "//opf:item[@id='cover-image']"); 
        
           if (coverImageId is not null) 
        
           { 
        
               return coverImageId; 
        
           }

With above changes, I was able to extract 100% of my epub files.
Also tested with the https://www.feedbooks.com/book/1421/the-adventures-of-sherlock-holmes epub file and it's working fine.

I will try to make a pull-request for these changes.

Thanks!

sidney-eliot · 2024-11-12T19:26:41Z

PDF books have the same issue, would be great if those could also generate their own cover image using the first page.

sidney-eliot · 2024-11-12T19:50:02Z

For the time being, this command will generate thumbnails using the pdfs first page. (To be more specific, it recursively looks through all folders in and below from where the command was executed and creates a jpg image beside every pdf and names it the same as the pdf. You can change resolution and quality with -density ... and -quality ... (quality range is from 0-100). Not sure at the moment how to make density use the original image resolution)

for /r %f in (*.pdf) do magick convert -density 120 -quality 70 "%f[0]" "%~dpnf.jpg"

Requires https://imagemagick.org and https://ghostscript.com installed. (You can either add the installed magick to your ENV path var or use the path to the exe instead of the word "magick" in the command)

Windows - cmd

bt4ibwem8 · 2024-12-05T18:24:57Z

This issue can be closed if it was merged, right?

unfedorg mentioned this issue Apr 14, 2024

Improve cover image extraction from epub files #102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cover image not extracted from EPUB files #100

Cover image not extracted from EPUB files #100

godvino commented Feb 27, 2024

unfedorg commented Apr 14, 2024

sidney-eliot commented Nov 12, 2024

sidney-eliot commented Nov 12, 2024 •

edited

Loading

bt4ibwem8 commented Dec 5, 2024

Cover image not extracted from EPUB files #100

Cover image not extracted from EPUB files #100

Comments

godvino commented Feb 27, 2024

unfedorg commented Apr 14, 2024

sidney-eliot commented Nov 12, 2024

sidney-eliot commented Nov 12, 2024 • edited Loading

bt4ibwem8 commented Dec 5, 2024

sidney-eliot commented Nov 12, 2024 •

edited

Loading