page.find_tables how to extract words in table cell #3768
Unanswered
wangqiangJN
asked this question in
Looking for help
Replies: 1 comment
-
This feature is already there: A minor issue may arise when the table has very narrow cell borders. Then the table finder might identify cell content that technically is not completely inside a cell. The general text extraction is very strict and will discard everything not completely inside the rectangle. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is your feature request related to a problem? Please describe.
pymupdf version 1.24.9
1 I want to parse my pdf , my pdf contains tables and other text.
2.when I use page. find_tables ,it can extract text in cell , but I find when cell has multi words ,as
example cell
price:520 people:bob
expect result by words :
price
520
people
bob
but results now
price:520 people:bob
3. so i want to split table cell content by words and get bbox, here have any function method and solution?
Beta Was this translation helpful? Give feedback.
All reactions