-
Please provide all mandatory information! I am using a pdf file and trying extracting the highlighted text Describe the bug (mandatory)My code: def main(): To Reproduce (mandatory)I try running the above code. I am using a pdf file and trying extracting the highlighted text Explain the steps to reproduce the behavior, For example, include a minimal code snippet, example files, etc. File "\pythonextractHighLightFromPdf\main.py", line 11, in main For problems when building or installing PyMuPDF, give the full output of the build/install command so that, for example, all pip/compiler/linker errors/warnings can be seen. Expected behavior (optional)Describe what you expected to happen (if not obvious). Screenshots (optional)If applicable, add screenshots to help explain your problem. Your configuration (mandatory)
Installed using pip For example, the output of Additional context (optional)Add any other context about the problem here. |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 15 replies
-
This issue is not reproducible because there is no reproducing document. |
Beta Was this translation helpful? Give feedback.
-
Please confirm that you determined your file is corrupt or let us have it to reproduce the problem. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Uploaded the file used as example. PDF is opening everywhere else as expected. I don't see any corruption aspect. Thank you for your time and looking into it. |
Beta Was this translation helpful? Give feedback.
-
main.txt |
Beta Was this translation helpful? Give feedback.
-
This is no bug, therefore continuing research as "Discussions" item. |
Beta Was this translation helpful? Give feedback.
-
Modifying your iterator like this will make sure you are only dealing with annotations, and only dealing with annotation types you actually want. for annot in page.annots(
types=(fitz.PDF_ANNOT_HIGHLIGHT, fitz.PDF_ANNOT_UNDERLINE) # add more types as you like
):
... |
Beta Was this translation helpful? Give feedback.
-
Just as a hint: to include code in Github, wrap its lines with the two lines having three backticks ("`"). This will preserve leading spaces, etc. |
Beta Was this translation helpful? Give feedback.
-
@JorjMcKie I received xref 25051 is not an annot of this page while deleting annot using delete_annot(). Could you please suggest me the solution? |
Beta Was this translation helpful? Give feedback.
page = doc.load_page
makes no sense at all and can be deletedfor annot in page.annots():
you will select every item in the page's annotations array - not only the highlight annotations which you actually want. Therefore, on page 3 (page.number = 2) you are running into that page page's array item732 0 R
which is no existing PDF object in this document. Clearly an invalid specification.