Skip to content

fix: Use --input-path/--output-path as base_dir in json+binref encoder #304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

nmheim
Copy link
Contributor

@nmheim nmheim commented Aug 1, 2025

Relevant issue or PR

When passing json+binref encoded payloads to a tesseract, the base_dir (which stores the .bin files) was not set.
This would not be an issue if the payload would be encoded with a proper InputFileReference, but I don't think we can assume this in all cases.

Description of changes

Use input path/output path as the default for base_dir (if it is not explicitly set)

Testing done

CI passes.

Copy link

codecov bot commented Aug 1, 2025

Codecov Report

❌ Patch coverage is 72.72727% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.94%. Comparing base (366c328) to head (7e48fe0).

Files with missing lines Patch % Lines
tesseract_core/sdk/engine.py 0.00% 2 Missing and 2 partials ⚠️
tesseract_core/runtime/array_encoding.py 71.42% 1 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (366c328) and HEAD (7e48fe0). Click for more details.

HEAD has 13 uploads less than BASE
Flag BASE (366c328) HEAD (7e48fe0)
25 12
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #304      +/-   ##
==========================================
- Coverage   78.39%   71.94%   -6.45%     
==========================================
  Files          29       29              
  Lines        3249     3240       -9     
  Branches      501      500       -1     
==========================================
- Hits         2547     2331     -216     
- Misses        484      697     +213     
+ Partials      218      212       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nmheim
Copy link
Contributor Author

nmheim commented Aug 5, 2025

It seems like logger.warnings don't end up in stderr anymore. Should I be using LogPipe? if you have pointers I'd be happy to fix it:)

@dionhaefner
Copy link
Contributor

Thanks, I can take it from here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those functions were a bit bare-bones after the msgpack deprecation, inlined instead.

Comment on lines -423 to +404
param=InputSchema,
param_hint=InputSchema.__name__,
param_hint="payload",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by change: wrong usage

@@ -628,7 +628,7 @@ def serve(
logger.info(
f"Tesseract container name, use it with 'tesseract teardown' command: {container_name}"
)
container_meta = {"container_name": container_name, "containers": container_ports}
container_meta = {"container_name": container_name, "containers": [container_ports]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by change: "containers" implies a collection, not a singular value, so I turned it into a list here.

@@ -548,6 +512,113 @@ def serve_tesseract():
assert (tmp_path / "bar").exists()


@pytest.mark.parametrize("method", ["run", "serve"])
@pytest.mark.parametrize("array_format", ["json", "json+base64", "json+binref"])
def test_io_path_interactions(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meat of this PR, the new test ensures everything is e2e consistent across all formats and modes of invocation.

@@ -650,39 +721,6 @@ def _get_host_ip():
requests.get(f"http://localhost:{container.host_port}/health")


def test_tesseract_cli_options_parsing(built_image_name, tmpdir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered by new test

@@ -724,8 +762,8 @@ def apply(inputs: InputSchema) -> OutputSchema:
docker_cleanup["images"].append(img_tag)


def logging_tesseract_api_string():
return dedent(
def build_logging_tesseract(workdir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by refactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants