Skip to content

Get text from a pdf text stream object #2793

Answered by JorjMcKie
Mark-Joy asked this question in Q&A
Discussion options

You must be logged in to vote

Interesting.
You could try this:

import fitz

doc = fitz.open()
page = doc.new_page()
page.insert_font()  # make page knowing the font Helvetica
xref = doc.get_new_xref()  # create a new xref
doc.update_object(xref, "<<>>")  # make it a PDF dictionary

# your bytes string
stream = b"BT\n1 0 0 1 0 0 Tm\n/F1 23.00093 Tf\n27.60112 TL\n0.99996 0.009 -0.009 0.99996 1110 1326 Tm\n0 0 Td\n135.997 Tz\n<702e20> Tj\n47 0 Td\n138.3741 Tz\n<3234382e20> Tj\nET"

# replace the fontname by Helvetica
stream = stream.replace(b"/F1", b"/helv")  # change fontname to Helevetica standard name

doc.update_stream(xref, stream)  # insert into our new object
page.set_contents(xref)  # define this to be the page's…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@Mark-Joy
Comment options

@JorjMcKie
Comment options

Answer selected by Mark-Joy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants