Skip to content

PyMuPDF delete only specific image same as 1.png from a pdf pages #2708

Answered by JorjMcKie
BeastHunterTest1 asked this question in Q&A
Discussion options

You must be logged in to vote

In PDF, images have no name! So you cannot delete them in this way.

This has nothing to do with PyMuPDF or MuPDF!

I recommend the following approach:
For a given image, compute the hash code (like MD5) from its binary content.
Then loop over the images and extract them via doc.extract_image(xref) as you did. For each image extracted in this way, compute the hash code from img["image"] and compare it with that of the file.
If equal, delete the image as indicated.

A lot of things may still go wrong - which is the reason why we won't make an official enhancement - e.g.:

  • If the image file contains an alpha channel (transparency), there is no way to find its PDF equivalent using hash codes
  • I…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@BeastHunterTest1
Comment options

@JorjMcKie
Comment options

Answer selected by BeastHunterTest1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
enhancement wontfix no intention to resolve
2 participants
Converted from issue

This discussion was converted from issue #2707 on October 02, 2023 13:19.