Skip to content

Commit 166b007

Browse files
committed
Add Page method clip_to_rect
1 parent 16cf714 commit 166b007

File tree

3 files changed

+58
-0
lines changed

3 files changed

+58
-0
lines changed

docs/page.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ In a nutshell, this is what you can do with PyMuPDF:
6262
:meth:`Page.annot_xrefs` PDF only: a list of annotation (and widget) xrefs
6363
:meth:`Page.annots` return a generator over the annots on the page
6464
:meth:`Page.apply_redactions` PDF only: process the redactions of the page
65+
:meth:`Page.clip_to_rect` PDF only: remove page content outside a rectangle
6566
:meth:`Page.bound` rectangle of the page
6667
:meth:`Page.cluster_drawings` PDF only: bounding boxes of vector graphics
6768
:meth:`Page.delete_annot` PDF only: delete an annotation
@@ -1961,6 +1962,16 @@ In a nutshell, this is what you can do with PyMuPDF:
19611962

19621963
These changes are **permanent** and cannot be reverted.
19631964

1965+
.. method:: clip_to_rect(rect)
1966+
1967+
PDF only: Permanently remove page content outside the given rectangle. This is similar to :meth:`Page.set_cropbox`, but the page's rectangle will not be changed, only the content outside the rectangle will be removed.
1968+
1969+
:arg rect_like rect: The rectangle to clip to. Must be finite and its intersection with the page must not be empty.
1970+
1971+
The method works best for text: All text on the page will be removed (decided by single character) that has no intersection with the rectangle. For vector graphics, the method will remove all paths that have no intersection with the rectangle. For images, the method will remove all images that have no intersection with the rectangle. Vectors and images **having** an intersection with the rectangle, will be kept in their entirety.
1972+
1973+
The method roughly has the same effect as if four redactions had been applied that cover the rectangle's outside.
1974+
19641975
.. method:: remove_rotation()
19651976

19661977
PDF only: Set page rotation to 0 while maintaining appearance and page content.

src/__init__.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8735,6 +8735,16 @@ def recolor(self, components=1):
87358735
ropts = mupdf.PdfRecolorOptions(ropt)
87368736
mupdf.pdf_recolor_page(pdfdoc, self.number, ropts)
87378737

8738+
def clip_to_rect(self, rect):
8739+
"""Clip away page content outside the rectangle."""
8740+
clip = Rect(rect)
8741+
if clip.is_infinite or (clip & self.rect).is_empty:
8742+
raise ValueError("rect must not be infinite or empty")
8743+
clip *= self.transformation_matrix
8744+
pdfpage = _as_pdf_page(self)
8745+
pclip = JM_rect_from_py(clip)
8746+
mupdf.pdf_clip_page(pdfpage, pclip)
8747+
87388748
@property
87398749
def artbox(self):
87408750
"""The ArtBox"""

tests/test_clip_page.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
"""
2+
Test Page method clip_to_rect.
3+
"""
4+
5+
import os
6+
import pymupdf
7+
8+
9+
def test_clip():
10+
"""
11+
Clip a Page to a rectangle and confirm that no text has survived
12+
that is completely outside the rectangle..
13+
"""
14+
scriptdir = os.path.dirname(os.path.abspath(__file__))
15+
rect = pymupdf.Rect(200, 200, 400, 500)
16+
filename = os.path.join(scriptdir, "resources", "v110-changes.pdf")
17+
doc = pymupdf.open(filename)
18+
page = doc[0]
19+
page.clip_to_rect(rect) # clip the page to the rectangle
20+
# capture font warning message of MuPDF
21+
assert pymupdf.TOOLS.mupdf_warnings() == "bogus font ascent/descent values (0 / 0)"
22+
# extract all text characters and assert that each one
23+
# has a non-empty intersection with the rectangle.
24+
chars = [
25+
c
26+
for b in page.get_text("rawdict")["blocks"]
27+
for l in b["lines"]
28+
for s in l["spans"]
29+
for c in s["chars"]
30+
]
31+
for char in chars:
32+
bbox = pymupdf.Rect(char["bbox"])
33+
if bbox.is_empty:
34+
continue
35+
assert bbox.intersects(
36+
rect
37+
), f"Character '{char['c']}' at {bbox} is outside of {rect}."

0 commit comments

Comments
 (0)