Replies: 4 comments 6 replies
-
Hello Martin Thoma, I see you edited my post almost immediately, can you please help me with an example on how to extract the text from the coordinates with the visitor? |
Beta Was this translation helpful? Give feedback.
-
The first snippet is a part of the annotations documentation, the second one is about visitor functions. They are completely different. I don't know why you think there is a link between the two. |
Beta Was this translation helpful? Give feedback.
-
Can you please provide an example on how to EXTRACT the text from the annotations coordinates? |
Beta Was this translation helpful? Give feedback.
-
OK, so the simple answer is: No, pypdf can NOT handle extracting text from highlight annotations. This answers also: #701 Some libraries that CAN handle this: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I want to extract highlight annotations.
Following example at:
https://pypdf.readthedocs.io/en/latest/user/reading-pdf-annotations.html#highlights
How do I use these coordinates to extract the text from in the last line:
x1, y1, x2, y2, x3, y3, x4, y4 = coords
I understand I should use the
visitor_text
https://pypdf.readthedocs.io/en/latest/user/extract-text.html?highlight=extract_text#using-a-visitor
https://pypdf.readthedocs.io/en/latest/modules/PageObject.html?highlight=visitor_text#pypdf._page.PageObject.extract_text
But the use of this function is very confusing to me and I can't seem to wrap my head around the 2 examples provided (Ignore header and footer, Extract rectangles and texts into a SVG-file)
Anybody so kind to show me the link between following code examples:
Beta Was this translation helpful? Give feedback.
All reactions