PyMuPDF delete only specific image same as 1.png from a pdf pages #2708

BeastHunterTest1 · 2023-10-02T13:01:10Z

BeastHunterTest1
Oct 2, 2023

hi i tried some code using chat gtb to delete all images same as 1.png from pdf called 1.pdf this is the code

`import fitz # PyMuPDF

def remove_image_from_pdf(input_pdf, output_pdf, image_name):
# Open the input PDF file
pdf_document = fitz.open(input_pdf)

for page_number in range(len(pdf_document)):
    page = pdf_document[page_number]
    xref_list = page.get_images(full=True)
    
    for xref in xref_list:
        base_image = pdf_document.extract_image(xref[0])
        image_data = base_image["image"]

        if image_data.endswith(b".png") and image_data.get("name") == image_name.encode():
            # Remove the image from the page
            page.apply_redact([xref])
            page.insert_redact(xref, width=0)
            page.apply_redactions()

# Save the modified PDF to the output file
pdf_document.save(output_pdf)
pdf_document.close()

if name == "main":
input_pdf_file = "1.pdf"
output_pdf_file = "1_remove.pdf"
image_name_to_remove = "1.png"

remove_image_from_pdf(input_pdf_file, output_pdf_file, image_name_to_remove)
print(f"Image '{image_name_to_remove}' removed and saved as '{output_pdf_file}'.")

`

and it...does save the new pdf but doesnot remove anything sooo i am trying to remove a translate img from pdf btw.
any help will be great. thanksss.

Answered by JorjMcKie

Oct 2, 2023

In PDF, images have no name! So you cannot delete them in this way.

This has nothing to do with PyMuPDF or MuPDF!

I recommend the following approach:
For a given image, compute the hash code (like MD5) from its binary content.
Then loop over the images and extract them via doc.extract_image(xref) as you did. For each image extracted in this way, compute the hash code from img["image"] and compare it with that of the file.
If equal, delete the image as indicated.

A lot of things may still go wrong - which is the reason why we won't make an official enhancement - e.g.:

If the image file contains an alpha channel (transparency), there is no way to find its PDF equivalent using hash codes
I…

View full answer

JorjMcKie · 2023-10-02T13:18:59Z

JorjMcKie
Oct 2, 2023
Maintainer

This a typical "Discussions" item, so I am going to move it there first.

2 replies

BeastHunterTest1 Oct 2, 2023
Author

hi sorry there was only req fection or bug option.

JorjMcKie Oct 2, 2023
Maintainer

In PDF, images have no name! So you cannot delete them in this way.

This has nothing to do with PyMuPDF or MuPDF!

I recommend the following approach:
For a given image, compute the hash code (like MD5) from its binary content.
Then loop over the images and extract them via doc.extract_image(xref) as you did. For each image extracted in this way, compute the hash code from img["image"] and compare it with that of the file.
If equal, delete the image as indicated.

A lot of things may still go wrong - which is the reason why we won't make an official enhancement - e.g.:

If the image file contains an alpha channel (transparency), there is no way to find its PDF equivalent using hash codes
If the image file is black & white, then the PDF compression filter CCITTFaxDecode will probably have been used and the image extracted via doc.extract_image() will not deliver the same binary.
... probably more

Answer selected by BeastHunterTest1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyMuPDF delete only specific image same as 1.png from a pdf pages #2708

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PyMuPDF delete only specific image same as 1.png from a pdf pages #2708

Uh oh!

BeastHunterTest1 Oct 2, 2023

Replies: 1 comment · 2 replies

Uh oh!

JorjMcKie Oct 2, 2023 Maintainer

Uh oh!

BeastHunterTest1 Oct 2, 2023 Author

Uh oh!

JorjMcKie Oct 2, 2023 Maintainer

BeastHunterTest1
Oct 2, 2023

Replies: 1 comment 2 replies

JorjMcKie
Oct 2, 2023
Maintainer

BeastHunterTest1 Oct 2, 2023
Author

JorjMcKie Oct 2, 2023
Maintainer