Tesseract might output multiple HOCR bodies when a TIFF is multilayered #104
Labels
bug
Something isn't working
External Bug
Not us, them
ocrhighlight
Post processor Plugins
The ones with a ->run() method
Solr Indexing
Putting things where they can be found
Milestone
What?
Never a dull day. Multi layered TIFFs and pyramidal ones? might be processed by Tesseract as a single File with two outputs.
The largest issue with that is the fact that the HOCR body will have duplicated HTML IDs .. making the parser fail.
I have no solution yet ...
The text was updated successfully, but these errors were encountered: