feat: Add streaming support for Mistral v11 tool format #20503

sjuxax · 2025-07-04T20:30:50Z

Follow-up to #19425
Fixes #20028

Purpose

Based on avigny's work in #19425, we substantially improve the Mistral tool calling parser to handle the tool call format in MistralTokenizer v11.

Test Plan

Used avigny's test suite attached to #19425, passes.

Test Result

Tested it in streaming on https://huggingface.co/jeffcookio/Mistral-Small-3.2-24B-Instruct-2506-awq-sym, works. Didn't test non-streaming or other checkpoints, so not sure if they work yet.

Co-authored-by: avigny <[email protected]> Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]> Co-authored-by: aider (gemini/gemini-2.5-pro) <[email protected]> Signed-off-by: Jeff Cook <[email protected]>

github-actions · 2025-07-04T20:30:57Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @sjuxax, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive streaming support for Mistral's v11 tool calling format by completely overhauling the existing tool parsing mechanism. The changes focus on robustly handling incremental output, dynamically adapting to the new format, and improving overall parsing efficiency, ensuring seamless tool integration during model generation.

Highlights

Streaming Tool Call Support: I have significantly improved the Mistral tool calling parser to add robust streaming support for the new Mistral v11 tool format. This includes handling incremental parsing of tool names and arguments.
Refactored Parsing Logic: The core streaming parsing logic within MistralToolParser has been completely re-architected. This involves introducing a new state machine (StreamingState enum) and dedicated state variables for more precise and efficient parsing of tool calls as they stream in.
Dynamic Format Detection: The parser now dynamically detects whether the incoming tool call format is the traditional JSON array or the newer Mistral v11 format (e.g., ToolName{arguments}), adapting its parsing strategy accordingly.
Performance Optimizations: I've implemented optimized regex patterns and caching mechanisms for JSON parsing to enhance the performance of tool call extraction during streaming.
Comprehensive Testing: A new, extensive test suite has been added to validate both non-streaming and streaming tool call extraction for various scenarios, including single and multiple tool calls, different argument types, and the new v11 format.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces streaming support for the Mistral v11 tool format. While the refactoring to a state machine is a positive step, there are critical issues related to correctness and robustness, particularly in the v11 parsing logic and test coverage. Addressing these is essential for the stability of the new feature.

vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py

tests/tool_use/test_mistral_tool_parser.py

vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py

PedroMiolaSilva · 2025-07-07T17:16:37Z

@sjuxax hey! I've tested your solution here and it seems to be working, nice job!

For non-streaming, I think we can fix it by replacing

lines 510:535:

            # jsons is difficult
            try:
                if self.fn_name_regex:
                    matches = self.fn_name_regex.findall(tool_content)

                    function_call_arr = []
                    for match in matches:
                        fn_name = match[0]
                        args = match[1]

                        # fn_name is encoded outside serialized json dump
                        # only arguments are serialized
                        function_call_arr.append({
                            "name": fn_name,
                            "arguments": json.loads(args)
                        })
                else:
                    function_call_arr = json.loads(tool_content)
            except json.JSONDecodeError:
                # use a regex to find the part corresponding to the tool call.
                # NOTE: This use case should not happen if the model is trained
                # correctly. It's a easy possible fix so it's included, but
                # can be brittle for very complex / highly nested tool calls
                raw_tool_call = self.tool_call_regex.findall(tool_content)[0]
                function_call_arr = json.loads(raw_tool_call)

with this:

            #First, use the tool call token to split, and we discard the first item, because it is empty
            raw_tool_calls = model_output.split(self.bot_token)[1:] 
            function_call_arr = []
            for raw_tool_call in raw_tool_calls:
                tool_name = raw_tool_call.split("{")[0]
                tool_arguments_begin = raw_tool_call.find("{")
                tool_arguments = raw_tool_call[tool_arguments_begin:]
                function_call_arr.append({
                                        "name": tool_name,
                                        "arguments": json.loads(tool_arguments)
                })

I've commented this also on #19425, but mentioning here also!

avigny · 2025-07-08T18:55:30Z

@sjuxax
I've updated the tests for streaming extraction using the new format, in #19425
Feel free to cherrypick them if you want ;)

aarnphm · 2025-07-08T19:46:44Z

vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py

+
+        # Core streaming state
+        self.raw_tool_calls: str = ""
+        self.streaming_state: StreamingState = StreamingState.WAITING_FOR_TOOL_START


Can you fix these ruff error?

vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py

Co-authored-by: Aaron Pham <[email protected]>

…sed tools Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

…ex and JSON decoding Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

… JSON corruption Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

… and using offset-based parsing Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

…mpatibility Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

Co-authored-by: aider (claude-opus-4-20250514) <[email protected]>

sjuxax · 2025-07-12T18:33:44Z

Addressed Gemini's comments with Sonnet/Opus. I've been using these changes on my Mistral3.1-rebase branch with success for the last week or so.

@avigny, will take a look at your tests and probably pull them in in place of the Opus-autobuilt ones tomorrow.

@PedroMiolaSilva, thanks for persistently posting that snippet. I'll test and pull it in tomorrow too.

hibukipanim · 2025-07-13T09:06:51Z

Here is a snippet which reproduces some errors with Mistral Small 3.2 with commit b521f50

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "unit"]
        }
    }
}]

is_stream = False # <--- try also with True
out = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[{"role": "user", "content": "Where is colder tomorrow San Francisco or New York?"}],
    tools=tools,
    stream=is_stream,
    temperature=0,
)

if is_stream:
    for chunk in out:
        print(chunk.choices[0])
else:
    print(out.choices[0])

when I run it with is_stream=False than the tool-parser throws a JSONDecodeError.
when run with is_stream=True than it doesn't error but returns only the first tool (there are 2 here with temperature=0).
This snippet does work well with #19425 though

sjuxax and others added 2 commits July 4, 2025 13:38

Bring in tests from vllm-project#19425

c78d1fb

sjuxax requested a review from aarnphm as a code owner July 4, 2025 20:30

gemini-code-assist bot reviewed Jul 4, 2025

View reviewed changes

mergify bot added frontend tool-calling labels Jul 4, 2025

github-project-automation bot added this to Tool Calling Jul 4, 2025

gemini-code-assist bot reviewed Jul 4, 2025

View reviewed changes

sjuxax mentioned this pull request Jul 4, 2025

Map Mistral-HF models back onto Mistral format on-the-fly #20471

Draft

8 tasks

aarnphm reviewed Jul 8, 2025

View reviewed changes

vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py Outdated Show resolved Hide resolved

sjuxax and others added 8 commits July 12, 2025 12:26

Update vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py

2dedc6e

Co-authored-by: Aaron Pham <[email protected]>

fix: prevent infinite loop in Mistral tool parsing by removing proces…

5966d37

…sed tools Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

refactor: improve JSON parsing for Mistral tool calls with robust reg…

9670aee

…ex and JSON decoding Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

refactor: Improve quote normalization in tool call parsing to prevent…

f280821

… JSON corruption Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

refactor: remove quote normalization from Mistral tool parser

ed3dc1d

Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

refactor: optimize tool call parsing by removing substring operations…

d089554

… and using offset-based parsing Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

refactor: Replace X | Y union syntax with Union for Python 3.9 co…

dee4d43

…mpatibility Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>

feat: add comprehensive tests for Mistral v11 tool format

b521f50

Co-authored-by: aider (claude-opus-4-20250514) <[email protected]>

sjuxax added 2 commits July 13, 2025 15:43

ruff/yapf

2966baa

Via Grok4: attempt to fix non-streaming and multiple tool calls

ef4d46c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Add streaming support for Mistral v11 tool format #20503

feat: Add streaming support for Mistral v11 tool format #20503

sjuxax commented Jul 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PedroMiolaSilva commented Jul 7, 2025

Uh oh!

avigny commented Jul 8, 2025

Uh oh!

aarnphm Jul 8, 2025

Uh oh!

Uh oh!

sjuxax commented Jul 12, 2025

Uh oh!

hibukipanim commented Jul 13, 2025

Uh oh!

Uh oh!

Uh oh!

feat: Add streaming support for Mistral v11 tool format #20503

Are you sure you want to change the base?

feat: Add streaming support for Mistral v11 tool format #20503

Conversation

sjuxax commented Jul 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jul 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PedroMiolaSilva commented Jul 7, 2025

Uh oh!

avigny commented Jul 8, 2025

Uh oh!

aarnphm Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sjuxax commented Jul 12, 2025

Uh oh!

hibukipanim commented Jul 13, 2025

Uh oh!

Uh oh!

sjuxax commented Jul 4, 2025 •

edited by github-actions bot

Loading