Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pdfminer_utils.py #3974

Merged
merged 4 commits into from
Apr 8, 2025
Merged

Conversation

Nathan-GoSupply
Copy link
Contributor

Fix for 'PSSyntaxError' import error:
"cannot import name 'PSSyntaxError' from 'pdfminer.pdfparser'"

Latest pdfminer-six doesn't import PSSyntaxError into pdfminer.pdfparser anymore. It must now be directly imported from its source (pdfminer.psexceptions)

Fix for 'PSSyntaxError' import error.
"cannot import name 'PSSyntaxError' from 'pdfminer.pdfparser'"

Latest pdfminer-six doesn't import PSSyntaxError into `pdfminer.pdfparser` anymore. It must now be directly imported from its source (`pdfminer.psexceptions`)
Copy link
Contributor

@cragwolfe cragwolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. However, I think the version of pdfminer should be checked to determine whether to import the older or newer version for backwards compat.

@Nathan-GoSupply
Copy link
Contributor Author

This change will also work on the older version. In the older version, pdfminer.pdfparser imports PSSyntaxError from pdfminer.psexceptions.

However they have since removed the PSSyntaxError import from pdfminer.pdfparser.

Therefore, for the new pdfminer version we must change to directly import from pdfminer.psexceptions.
So instead of
pdfminer_utils.py -> pdfminer.pdfparser ->pdfminer.psexceptions

We can do
pdfminer_utils.py -> pdfminer.psexceptions

PSSyntaxError is defined in pdfminer.psexceptions in both the old and new versions of pdfminer, so we will still get backward compatibility.

Here is the commit for the change on pdfminer.

@cragwolfe
Copy link
Contributor

please add a bullet under Fixes in, CHANGELOG.md.

thanks for the contribution @Nathan-GoSupply ! the reference to the pdfminer commit is also appreciated.

Nathan-GoSupply added a commit to Nathan-GoSupply/unstructured that referenced this pull request Apr 8, 2025
@cragwolfe cragwolfe merged commit 27f503c into Unstructured-IO:main Apr 8, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants