5.0.0rc1
Pre-release
Pre-release
This is a major release with a lot of breaking changes but most changes are easy to fix.
It focuses on type safety with the introduction of runtime checks: any call to zimscraperlib API must match the type definition or an exception will be raised.
Documentation is available as docstrings and on https://python-scraperlib.readthedocs.io
Main changes includes:
- ZIM metadata handling has completely changed with new types for each kind of metadata.
i18nmodule has been redesigned around a single main classLanguage- New
rewritingmodule for HTTML/CSS/JS (that one being done at runtime via Wombat) - Now supporting only Python 3.12
Added
- Documentation using
mkdocs, published on readthedocs.com (#92) rewritingmodule to rewrite URLs in content for generic scrapersrewriting.cssto rewrite URLs in CSS filesrewriting.htmlto rewrite URLs in HTML filesrewriting.jsto rewrite URLs in JS files (at runtime, usingwombat)wombat-setupjavascript module injavascript/
typingmodule with custom types:Callbackto use where we expect callbacksSupportsWrite,SupportsRead,SupportsSeekingSupportsSeekableReadandSupportsSeekableWrite: protocols for IO type annotations
zim.metadatamodule with a type-based approach for each kind of metadata and helpers for custom ones- [
zim.metadata]APPLY_RECOMMENDATIONS: general flag to toggle openZIM-recommended constraints - [
zim.metadata] Type-based classes:Metadata,TextBasedMetadata,TextListBasedMetadata,DateBasedMetadata,IllustrationBasedMetadata - [
zim.metadata] Usage-based classes:NameMetadata,LanguageMetadata,DefaultIllustrationMetadata, etc. - [
zim.metadata]StandardMetadataListto package the standard metadata - See details for additional API endpoints and variables
- [
- [
constants]DEFAULT_WEB_REQUESTS_TIMEOUTexposed fordownloadmodule - [
download]stream_file()now acceptstimeout: intparam (defaults to constant timeout) (#222) - [
filesystem]path_fromcontext manager to acquire a pathlibPathfromPathorTemporaryDirectory - [
i18n]Language,get_language()andget_language_or_none(). See breaking changes - [
image.optimization]OptimizePngOptionsdataclass to store PNG options - [
image.optimization]OptimizeJpgOptionsdataclass to store JPEG options - [
image.optimization]OptimizeGifOptionsdataclass to store WebP options - [
image.optimization]OptimizeOptionsdataclass to store cross-formats options - [
inputs]unique_values()to deduplicate a list while preserving order - [
logging]DEFAULT_FORMAT_WITH_THREADSas many scrapers uses threads - [
video.encoding]reencode()'sexisting_tmp_pathparam - [
zim.filesystem]validate_folder_writable()to ensure one can write into a folder (#200) - [
zim.creator]Creator._get_first_language_metadata_value()to retrieve first language from metadata - [
zim.items]no_indexing_indexdata()to get an IndexData that disables indexing - [
zim.items]URLItem.get_mimetype()now only returningstr
Changed (Breaking)
- Entire API is now type-protected using beartype. Any call to scraperlib that doesn't satisfy the annotated types will raise an exception
- [
constants]MANDATORY_ZIM_METADATA_KEYSandDEFAULT_DEV_ZIM_METADATAmoved tozim/metadata - [
download]YoutubeDownloader.download'soptionsparameters now expect andict[str, Any]instead ofdict - [
download]YoutubeConfigoptions now limited tostr | bool | int | None - [
download]_get_retry_adapter()now exposed asget_retry_adapter() - [
download]stream_file'sbyte_stream' param now more flexible, acceptingSupportsWrite[bytes] | SupportsSeekableWrite[bytes]` - [
download]stream_file'sproxiesparam now acceptingdict[str, str]instead ofdict - [
filesystem]delete_callback()is now a simple callback accepting anfpathand deleting it (doesn't chain other callback anymore). - [
filesystem]delete_callback()doesn't fail on missing file (#192) - [
i18n] Redesigned API around a single object:Languagewhich is inited with any acceptable code. RaisesNotFoundErroron 639-3 matching failurefind_language_names()is retained but only accepts aquery: str- added
get_language()andget_language_or_none()as shortcuts aroundLanguage is_valid_iso_639_3()is retained
- [
image.conversion]convert_image()now acceptsio.BytesIOin place ofIO[bytes]forsrcanddst. - [
image.conversion]convert_svg2png()now acceptsio.BytesIOin place ofIO[bytes]forsrcanddst. - [
image.optimization]optimize_png()now acceptsoptions: OptimizePngOptionsinstead of individual params. - [
image.optimization]optimize_jpeg()now acceptsoptions: OptimizeJpgOptionsinstead of individual params. - [
image.optimization]optimize_webp()now acceptsoptions: OptimizeWebpOptionsinstead of individual params. - [
image.optimization]optimize_gif()now acceptsoptions: OptimizeGifOptionsinstead of individual params. - [
image.presets] All presets now use the new options dataclass instead of ClassVar dict - [
image.probing]format_for()now acceptsio.BytesIOin place ofIO[bytes]forsrc. - [
image.probing]is_valid_image()now acceptsio.BytesIOin place ofIO[bytes]forimage. - [
image.utils]save_image()now acceptsio.BytesIOin place ofIO[bytes]fordst. - [
video.config]Configwas mostly not using type annotations. - [
video.config]Configoptions only expectingstr | None - [
video.presets] All options only expectingstr | None - [
video.encoding]reencode()now always returning atuple[bool, CompletedProcess] - [
zim._libkiwix]MimetypeAndCounternow expects specific types formimetype: strandvalue: int - [
zim.filesystem]make_zim_file()publisherparam now properly expects anstr` - [
zim.filesystem]IncorrectZIMPathErrorrenamed toIncorrectPathError - [
zim.filesystem]MissingZIMFolderErrorrenamed toMissingFolderError - [
zim.filesystem]NotADirectoryZIMFolderErrorrenamed toNotADirectoryFolderError - [
zim.filesystem]NotWritableZIMFolderErrorrenamed toNotWritableFolderError - [
zim.filesystem]IncorrectZIMFilenameErrorrenamed toIncorrectFilenameError - [
zim.filesystem]validate_zimfile_creatable()renamed tovalidate_file_creatable() - [
zim.items]ItemandStaticItemnow expectinghintsasdict[libzim.writer.Hint, int]instead ofdict - [
zim.items]Item.get_hints()now returningdict[libzim.writer.Hint, int]instead ofdict - [
zim.items]URLItem.download_for_size()now specifying type annotations and reordered params - [
zim.providers]FileLikeProvider.gen_blob()andURLProvider.gen_blob()now properly annotates return type (Generator[libzim.writer.Blob, None, None]) - [
zim.providers]URLProvider.get_size_of()paramurlnow explicitly expects anstr - [
zim.creator]Creator.config_metadata()signature changed, now mainly accepting aStandardMetadataList - [
zim.creator]Creator.config_dev_metadata()signature changed to accept new metadata types - [
zim.creator]Creator.add_item_for()'scallbackrenamed tocallbacksand acceptingCallback - [
zim.creator]Creator.add_item()'scallbackrenamed tocallbacksand acceptingCallback
Changed
- [deps]
iso639-langnow requires at least v2.4.0 - [
download]stream_file()now returntuple[int, requests.structures.CaseInsensitiveDict[str]]instead oftuple[int, requests.structures.CaseInsensitiveDict] - [
download]stream_file()now accepts bothfpathandbyte_streamparams (writes to both) - [
image.utils]save_image()now acceptsAny**params. - [
zim.archive]Archive.countersnow returningCounterMap(compatible with previousdict[str, int])
Fixed
- Direct dependencies now properly references: pillow, urllib3, piexif, idna (#226)
- [
download]YoutubeDownloader.downloadnow respects its return type (bool | Future[Any]) - [
image.conversion]convert_image()**paramsproperly declared as acceptingNone. - [
logging]getLogger()'s'consolenow properly acceptingTextIO | io.StringIO | None - [
video.probing]get_media_info()type annotation forsrc_path - [
zim.archive]Archive.get_item()return type (libzim.reader.Item)
Removed
- Support for Python 3.8/3.9/3.10/3.11. Only Python 3.12 is supported now.
- [
i18n]Lang(See breaking changes) - [
i18n]get_iso_lang_data()(See breaking changes) - [
i18n]update_with_macro()(See breaking changes) - [
i18n]get_language_details()(See breaking changes) - [
uri]rebuild_urifailsafeparam (was only handling incorrect types) - [
video.encoding]reencode()'swith_processparam - [
zim.creator]Creator.validate_metadata() - [
zim.creator]Creator.convert_and_check_metadata()