Add PDF Image Extractor script with README documentation #500

gracetyy · 2025-10-20T10:48:00Z

This PR adds a new script, PDF Image Extractor, which recursively scans a directory tree for PDF files and extracts all embedded images from each document.

All extracted images are saved in a subfolder named PDF within the input root directory by default (customizable via --out).
Each PDF file is organized into its own folder, containing all images extracted from that document.
The script supports an optional --dedup flag to enable per-PDF deduplication of images.

Additional notes:

Please let me know if you’d like any changes to the folder naming or CLI options.
Happy to update documentation or add more examples if needed.

Add PDF Image Extractor script with README documentation

d68586e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add PDF Image Extractor script with README documentation #500

Add PDF Image Extractor script with README documentation #500

Uh oh!

gracetyy commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Add PDF Image Extractor script with README documentation #500

Are you sure you want to change the base?

Add PDF Image Extractor script with README documentation #500

Uh oh!

Conversation

gracetyy commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant