A feature-rich command-line tool for manipulating PDF files.
- Merge PDFs: Combine multiple PDF files into a single document.
- Split PDFs: Divide a PDF into multiple smaller files based on page ranges.
- Rotate Pages: Rotate all pages in a PDF by a specified angle.
- Protect PDFs: Add password protection to your PDF documents.
- Unprotect PDFs: Remove password protection from PDFs.
- Create from Text: Generate PDFs from text files or directly from raw text input.
- Create from Images: Convert one or more image files into a PDF document.
- Extract Text: Extract all text content from a PDF file.
- Extract Images: Extract all images embedded within a PDF file.
- Convert to Images: Convert each page of a PDF into an image file (e.g., PNG).
- Delete Pages: Remove specific pages from a PDF document.
- Watermark PDFs: Add a watermark (from another PDF) to your PDF documents.
- Reorder Pages: Change the order of pages within a PDF.
- Compress PDFs: Reduce the file size of PDF documents.
pdfcli/
├── pdfcli/ # Main application source code
│ ├── __init__.py
│ ├── main.py # CLI entry point
│ ├── merge.py # PDF merging logic
│ ├── split.py # PDF splitting logic
│ ├── rotate.py # PDF rotation logic
│ ├── protect.py # PDF protection logic
│ ├── unprotect.py # PDF unprotection logic
│ ├── fromtext.py # PDF creation from text logic
│ ├── fromimages.py # PDF creation from images logic
│ ├── extract_text.py # Text extraction logic
│ ├── watermark.py # Watermark adding logic
│ ├── reorder.py # Page reordering logic
│ ├── compress.py # PDF compression logic
│ ├── extract_images.py # Image extraction logic
│ └── to_images.py # PDF to image conversion logic
│ └── delete_pages.py # Page deletion logic
├── tests/ # Unit tests
│ ├── __init__.py
│ ├── test_merge.py
│ ├── test_split.py
│ ├── test_rotate.py
│ ├── test_protect.py
│ ├── test_unprotect.py
│ ├── test_fromtext.py
│ ├── test_fromimages.py
│ ├── test_extract_text.py
│ ├── test_watermark.py
│ ├── test_reorder.py
│ ├── test_compress.py
│ ├── test_extract_images.py
│ ├── test_to_images.py
│ └── test_delete_pages.py
├── .github/ # GitHub Actions workflows
│ └── workflows/
│ └── main.yml
├── requirements.txt # Python dependencies
├── setup.py # Project setup and packaging
└── README.md # Project documentation
- Python 3.x
- pip (Python package installer)
Once published to PyPI, you can install it directly:
pip install pdfcli-
Clone the repository:
git clone https://github.com/your-repo/pdfcli.git cd pdfcli -
Install the required dependencies:
pip install -r requirements.txt
-
Install the package in editable mode (for development):
pip install -e .
The pdfcli tool uses a command-line interface. Each feature is a subcommand with its own arguments.
pdfcli <command> [options]Merge multiple PDF files into one.
pdfcli merge -o output.pdf input1.pdf input2.pdf [input3.pdf ...]-o,--output: Path to the output PDF file (required).input: Paths to the input PDF files (one or more, required).
Split a PDF into multiple files based on page ranges.
pdfcli split -o output_prefix -r 1-5 6-10 input.pdf-o,--output: Output path prefix for the split files (required).-r,--ranges: Page ranges to split (e.g.,1-5for pages 1 to 5,6-10for pages 6 to 10).input: Path to the input PDF file (required).
Rotate all pages in a PDF by a specified angle.
pdfcli rotate -o output.pdf -r 90 input.pdf-o,--output: Path to the output PDF file (required).-r,--rotation: Rotation angle in degrees (e.g.,90,180,270).input: Path to the input PDF file (required).
Add a password to a PDF.
pdfcli protect -o output.pdf -p mypassword input.pdf-o,--output: Path to the output PDF file (required).-p,--password: Password to add (required).input: Path to the input PDF file (required).
Remove a password from a PDF.
pdfcli unprotect -o output.pdf -p mypassword input.pdf-o,--output: Path to the output PDF file (required).-p,--password: Password to remove (required).input: Path to the input PDF file (required).
Create a PDF from a text file or raw text.
# From a text file
pdfcli fromtext --input input.txt -o output.pdf
# From raw text
pdfcli fromtext --text "Hello, this is raw text content." -o output.pdf--input: Path to the input text file (optional, use with--text).--text: Raw text content to convert to PDF (optional, use with--input).-o,--output: Path to the output PDF file (required).
Create a PDF from image files.
pdfcli fromimages -o output.pdf image1.png image2.jpg [image3.jpeg ...]-o,--output: Path to the output PDF file (required).input: Paths to the input image files (one or more, required).
Extract text from a PDF file.
pdfcli extracttext -o output.txt input.pdf-o,--output: Path to the output text file (required).input: Path to the input PDF file (required).
Add a watermark to a PDF file.
pdfcli watermark -o output.pdf -w watermark.pdf input.pdf-o,--output: Path to the output PDF file (required).-w,--watermark: Path to the watermark PDF file (required).input: Path to the input PDF file (required).
Reorder pages in a PDF file.
pdfcli reorder -o output.pdf -p 3,1,2 input.pdf-o,--output: Path to the output PDF file (required).-p,--pages: Comma-separated new order of pages (e.g.,3,1,2to make page 3 the first, page 1 the second, and page 2 the third).input: Path to the input PDF file (required).
Compress a PDF file.
pdfcli compress -o compressed.pdf input.pdf-o,--output: Path to the output compressed PDF file (required).input: Path to the input PDF file (required).
Extract images from a PDF file.
pdfcli extractimages -o output_directory input.pdf-o,--output: Path to the output directory for images (required).input: Path to the input PDF file (required).
Convert PDF pages to image files.
pdfcli toimages -o output_directory input.pdf
pdfcli toimages -o output_directory -z 3 input.pdf # With a zoom factor-o,--output: Path to the output directory for images (required).-z,--zoom: Zoom factor for image conversion (default: 2). Higher values result in higher resolution images.input: Path to the input PDF file (required).
Delete specific pages from a PDF file.
pdfcli deletepages -o output.pdf -p 2,4 input.pdf-o,--output: Path to the output PDF file (required).-p,--pages: Comma-separated page numbers to delete (e.g.,2,4to delete page 2 and page 4).input: Path to the input PDF file (required).
To run all unit tests:
python -m unittest discover testsTo run a specific test file:
python -m unittest tests/test_merge.pyTo build a standalone executable for your current OS:
pyinstaller --onefile --name pdfcli pdfcli/main.pyThe executable will be found in the dist/ directory.
This project uses GitHub Actions for continuous integration and continuous delivery (CI/CD). The workflows are defined in the .github/workflows/ directory.
This workflow performs the following actions:
- Tests: Runs unit tests on
ubuntu-latest,windows-latest, andmacos-latestenvironments. - Build & Bundle: Creates standalone executables for
ubuntu-latest,windows-latest, andmacos-latestusing PyInstaller. - Release: Creates a GitHub Release with the bundled executables when a new tag is pushed.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.