Clowder extractor for PyMuPDF Extractor takes pdf file as input and outputs json and csv files with textual contents of the pdf file.
- Activate the virtual environment
- Install dependencies:
pip install -r requirements.txt - Run the extractor:
python extractor.py
- Run
docker build . -t hub.ncsa.illinois.edu/clowder/extractors-pymupdf:<version>to build docker image - If you ran into error
[Errno 28] No space left on device:, try below:- Free more spaces by running
docker system prune --all - Increase the Disk image size. You can find the configuration in Docker Desktop
- Free more spaces by running
- Login first:
docker login hub.ncsa.illinois.edu - Run
docker image push hub.ncsa.illinois.edu/clowder/extractors-pymupdf:<version>
- Please refer to Clowder instructions
- Current deployment
hub.ncsa.illinois.edu/clowder/extractors-pymupdf:0.2.0.0