Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cover image not extracted from EPUB files #100

Open
godvino opened this issue Feb 27, 2024 · 4 comments
Open

Cover image not extracted from EPUB files #100

godvino opened this issue Feb 27, 2024 · 4 comments

Comments

@godvino
Copy link

godvino commented Feb 27, 2024

Bookshelf fails to extract the cover image from some books in the EPUB format. The name of the book as well as the description and other details gets loaded correctly though.

One of the books that cause this issue is https://www.feedbooks.com/book/1421/the-adventures-of-sherlock-holmes

Going into Jellyfin's metadata folder after the book is imported, I can see a poster.jpg file created that is not an image actually.

$ cat poster.jpg 
<?xml version="1.0" encoding="UTF-8" ?>

<!DOCTYPE html PUBLIC
     "-//W3C//DTD XHTML 1.1//EN"
     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

  <head>
   <title>Cover</title>
   <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
  </head>
  
  <body>
    <div style="text-align: center; page-break-after: always;">
      <img src="images/cover.png" alt="Cover" style="height: 100%; max-width: 100%;" />
    </div>
  </body>

</html>

Screenshot from 2024-02-27 22-39-51

@unfedorg
Copy link
Contributor

I have 754 epub files under Jellyfin management and only 5 of them have cover image correctly extracted, so I have had a look on this issue and figured that there are 3 things to be improved.

  1. Treat calibre:series_index as Double
    1st thing I notice was that there are below error messages in the jellyfin server log on every metadata refresh attempt.
[2024-04-14 11:01:41.673 +08:00] [ERR] Error converting to int32
System.FormatException: Input string was not in a correct format.
   at System.Number.ThrowOverflowOrFormatException(ParsingStatus status, TypeCode type)
   at Jellyfin.Plugin.Bookshelf.Providers.OpfReader`1.ReadInt32AttributeInto(String xPath, Action`1 commitResult)

There is a place where the plugin takes calibre:series_index and convert it to Int32 but it's failing because calibre:series_index often has decimal. (e.g. "1.0")
Taking it as Double before convert to Int32 should solve for most cases.

ReadInt32AttributeInto("//opf:meta[@name='calibre:series_index']", index => book.IndexNumber = index);

  1. Accept empty opfRootDirectory

There is a code to check opfRootDirectory and if it's empty or null, it gives up to extract image.
However it's common that image file is placed at the root. Just accept empty string would solve this issue.
This actually solved on 93% of my epub files.

var opfRootDirectory = Path.GetDirectoryName(opfFilePath);
if (string.IsNullOrEmpty(opfRootDirectory))
{
return new DynamicImageResponse { HasImage = false };
}

  1. Improve xPath

Image extraction will fail to find a cover image if the epub file has an another object with "cover" id.
This can be improved by limiting only to the object with "image/" media-type.

Also I have some epub files that id for the cover image is "my-cover-image" instead of just a "cover-image".
I'm not sure if this is a common case but adding some wildcard may improve the chance of extracting the cover image.

var coverId = ReadEpubCoverInto(opfRootDirectory, "//opf:item[@id='cover']");
if (coverId is not null)
{
return coverId;
}
var coverImageId = ReadEpubCoverInto(opfRootDirectory, "//opf:item[@id='cover-image']");
if (coverImageId is not null)
{
return coverImageId;
}

With above changes, I was able to extract 100% of my epub files.
Also tested with the https://www.feedbooks.com/book/1421/the-adventures-of-sherlock-holmes epub file and it's working fine.

I will try to make a pull-request for these changes.

Thanks!

@sidney-eliot
Copy link

PDF books have the same issue, would be great if those could also generate their own cover image using the first page.

@sidney-eliot
Copy link

sidney-eliot commented Nov 12, 2024

For the time being, this command will generate thumbnails using the pdfs first page. (To be more specific, it recursively looks through all folders in and below from where the command was executed and creates a jpg image beside every pdf and names it the same as the pdf. You can change resolution and quality with -density ... and -quality ... (quality range is from 0-100). Not sure at the moment how to make density use the original image resolution)

for /r %f in (*.pdf) do magick convert -density 120 -quality 70 "%f[0]" "%~dpnf.jpg"

Requires https://imagemagick.org and https://ghostscript.com installed. (You can either add the installed magick to your ENV path var or use the path to the exe instead of the word "magick" in the command)

Windows - cmd

@bt4ibwem8
Copy link

This issue can be closed if it was merged, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants