-
Notifications
You must be signed in to change notification settings - Fork 790
Support returning multi-modal content from tools #1517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
747131e
to
bffbe4f
Compare
PR Change SummaryEnhanced support for returning multi-modal content from tools, ensuring compatibility with various models.
Modified Files
How can I customize these reviews?Check out the Hyperlint AI Reviewer docs for more information on how to customize the review. If you just want to ignore it on this PR, you can add the Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add |
@Kludex Please review! |
d9b7fb1
to
aca752f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with the models that don't support this? I don't see mistral, groq, etc.
@Kludex The same thing that would happen if a developer tries to pass unsupported multi-modal content to the LLM as a user part: PydanticAI raises an error. I think that's better than silently serializing the content part as JSON instead, as just passing along the URL would be unlikely to have the desired result.
I've added a test for Groq. I can add one for Mistral as well; is there a particular reason we're not actually calling to the API in |
Hmm, looks like Mistral, unlike the other 4 I tested, doesn't like this trick:
Edit: Got it to work! |
e7f3e04
to
cf49b0c
Compare
cf49b0c
to
c1c7392
Compare
@Kludex Ready for review again! |
@DouweM Does this also cover Bedrock (Claude)? |
@TheFirstMe The implementation is model-agnostic -- if the model support multi-modal input, PydanticAI will try to send the file after the tool output. I haven't specifically tested Bedrock though; if you have an API token and could give it a try it'd be much appreciated! |
@DouweM I have tested and can verify that it works with bedrock. Thanks! When can we expect this to be released? |
# Conflicts: # tests/models/test_openai.py
…-llama/llama-4-scout-17b-16e-instruct instead
2d3dec7
to
3ee3ca1
Compare
LGTM, just one thing I'm concerned about is that claude actually supports tool return image, and I'm not sure there will be a difference between that. I'm guessing the impact is minimal and I'll test it in my case later. |
Fixes #1497