Release v2.0.0 · OCR-D/ocrd_pagetopdf

What's Changed

Fix typo in README.md by @konstantinschulz in #25
Remove unused batch files by @stweil in #26
Fix dockerimage creation by @joschrew in #27
rewrite: Pythonic, ocrd v3, utilise page-level annotation by @bertsky in #28
- use Python instead of bashlib: faster, more flexible
- include a distribution of the PRImA PDF converter as package data
- instead of just the original image files, extract image data from the PAGE annotation, including any AlternativeImage
- for that, introduce params image_feature_selector and image_feature_filter (e.g. cropped,deskewed,binarized)
- support processing with METS Server and all new ocrd>=3.0 user-configurable features (page-parallel processing, page timeouts, error handling)
- extend negative2zero to full PAGE validation and repairs for coordinates
- back the font parameter by downloadable resources (ocrd resmgr); provide a variety of preconfigured fonts
- multipage: add setting pagelabels=pagelabels for @ORDER and @ORDERLABEL from physical structMap
- multipage: add parameter multipage_only to only keep the document-wide PDF, not the page-wise PDF files
- multipage: add logical structMap divs as outline labels (PDF bookmarks)
- multipage: improve and add more metadata, use proper formatting (string encoding, dates)
- multipage: add MODS as extra XMP metadata payload
- improve logging and relaying error messages
- add processor ocrd-altotopdf (with limited features) besides ocrd-pagetopdf
- add regression tests, CI and CD

Full Changelog: v1.1.0...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0

What's Changed

Contributors