-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
feat: Add streaming support for Mistral v11 tool format #20503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: avigny <[email protected]> Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]> Co-authored-by: aider (gemini/gemini-2.5-pro) <[email protected]> Signed-off-by: Jeff Cook <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @sjuxax, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces comprehensive streaming support for Mistral's v11 tool calling format by completely overhauling the existing tool parsing mechanism. The changes focus on robustly handling incremental output, dynamically adapting to the new format, and improving overall parsing efficiency, ensuring seamless tool integration during model generation.
Highlights
- Streaming Tool Call Support: I have significantly improved the Mistral tool calling parser to add robust streaming support for the new Mistral v11 tool format. This includes handling incremental parsing of tool names and arguments.
- Refactored Parsing Logic: The core streaming parsing logic within
MistralToolParser
has been completely re-architected. This involves introducing a new state machine (StreamingState
enum) and dedicated state variables for more precise and efficient parsing of tool calls as they stream in. - Dynamic Format Detection: The parser now dynamically detects whether the incoming tool call format is the traditional JSON array or the newer Mistral v11 format (e.g.,
ToolName{arguments}
), adapting its parsing strategy accordingly. - Performance Optimizations: I've implemented optimized regex patterns and caching mechanisms for JSON parsing to enhance the performance of tool call extraction during streaming.
- Comprehensive Testing: A new, extensive test suite has been added to validate both non-streaming and streaming tool call extraction for various scenarios, including single and multiple tool calls, different argument types, and the new v11 format.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces streaming support for the Mistral v11 tool format. While the refactoring to a state machine is a positive step, there are critical issues related to correctness and robustness, particularly in the v11 parsing logic and test coverage. Addressing these is essential for the stability of the new feature.
@sjuxax hey! I've tested your solution here and it seems to be working, nice job! For non-streaming, I think we can fix it by replacing lines 510:535:
with this:
I've commented this also on #19425, but mentioning here also! |
|
||
# Core streaming state | ||
self.raw_tool_calls: str = "" | ||
self.streaming_state: StreamingState = StreamingState.WAITING_FOR_TOOL_START |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix these ruff error?
Co-authored-by: Aaron Pham <[email protected]>
…sed tools Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
…ex and JSON decoding Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
… JSON corruption Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
… and using offset-based parsing Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
…mpatibility Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
Co-authored-by: aider (claude-opus-4-20250514) <[email protected]>
Addressed Gemini's comments with Sonnet/Opus. I've been using these changes on my Mistral3.1-rebase branch with success for the last week or so. @avigny, will take a look at your tests and probably pull them in in place of the Opus-autobuilt ones tomorrow. @PedroMiolaSilva, thanks for persistently posting that snippet. I'll test and pull it in tomorrow too. |
Here is a snippet which reproduces some errors with Mistral Small 3.2 with commit b521f50 from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location", "unit"]
}
}
}]
is_stream = False # <--- try also with True
out = client.chat.completions.create(
model=client.models.list().data[0].id,
messages=[{"role": "user", "content": "Where is colder tomorrow San Francisco or New York?"}],
tools=tools,
stream=is_stream,
temperature=0,
)
if is_stream:
for chunk in out:
print(chunk.choices[0])
else:
print(out.choices[0]) when I run it with |
Follow-up to #19425
Fixes #20028
Purpose
Based on avigny's work in #19425, we substantially improve the Mistral tool calling parser to handle the tool call format in MistralTokenizer v11.
Test Plan
Used avigny's test suite attached to #19425, passes.
Test Result
Tested it in streaming on https://huggingface.co/jeffcookio/Mistral-Small-3.2-24B-Instruct-2506-awq-sym, works. Didn't test non-streaming or other checkpoints, so not sure if they work yet.