Skip to content

feat: Add streaming support for Mistral v11 tool format #20503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

sjuxax
Copy link
Contributor

@sjuxax sjuxax commented Jul 4, 2025

Follow-up to #19425
Fixes #20028

Purpose

Based on avigny's work in #19425, we substantially improve the Mistral tool calling parser to handle the tool call format in MistralTokenizer v11.

Test Plan

Used avigny's test suite attached to #19425, passes.

Test Result

Tested it in streaming on https://huggingface.co/jeffcookio/Mistral-Small-3.2-24B-Instruct-2506-awq-sym, works. Didn't test non-streaming or other checkpoints, so not sure if they work yet.

sjuxax and others added 2 commits July 4, 2025 13:38
Co-authored-by: avigny <[email protected]>
Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
Co-authored-by: aider (gemini/gemini-2.5-pro) <[email protected]>
Signed-off-by: Jeff Cook <[email protected]>
@sjuxax sjuxax requested a review from aarnphm as a code owner July 4, 2025 20:30
Copy link

github-actions bot commented Jul 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @sjuxax, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive streaming support for Mistral's v11 tool calling format by completely overhauling the existing tool parsing mechanism. The changes focus on robustly handling incremental output, dynamically adapting to the new format, and improving overall parsing efficiency, ensuring seamless tool integration during model generation.

Highlights

  • Streaming Tool Call Support: I have significantly improved the Mistral tool calling parser to add robust streaming support for the new Mistral v11 tool format. This includes handling incremental parsing of tool names and arguments.
  • Refactored Parsing Logic: The core streaming parsing logic within MistralToolParser has been completely re-architected. This involves introducing a new state machine (StreamingState enum) and dedicated state variables for more precise and efficient parsing of tool calls as they stream in.
  • Dynamic Format Detection: The parser now dynamically detects whether the incoming tool call format is the traditional JSON array or the newer Mistral v11 format (e.g., ToolName{arguments}), adapting its parsing strategy accordingly.
  • Performance Optimizations: I've implemented optimized regex patterns and caching mechanisms for JSON parsing to enhance the performance of tool call extraction during streaming.
  • Comprehensive Testing: A new, extensive test suite has been added to validate both non-streaming and streaming tool call extraction for various scenarios, including single and multiple tool calls, different argument types, and the new v11 format.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces streaming support for the Mistral v11 tool format. While the refactoring to a state machine is a positive step, there are critical issues related to correctness and robustness, particularly in the v11 parsing logic and test coverage. Addressing these is essential for the stability of the new feature.

@PedroMiolaSilva
Copy link

@sjuxax hey! I've tested your solution here and it seems to be working, nice job!

For non-streaming, I think we can fix it by replacing

lines 510:535:

            # jsons is difficult
            try:
                if self.fn_name_regex:
                    matches = self.fn_name_regex.findall(tool_content)

                    function_call_arr = []
                    for match in matches:
                        fn_name = match[0]
                        args = match[1]

                        # fn_name is encoded outside serialized json dump
                        # only arguments are serialized
                        function_call_arr.append({
                            "name": fn_name,
                            "arguments": json.loads(args)
                        })
                else:
                    function_call_arr = json.loads(tool_content)
            except json.JSONDecodeError:
                # use a regex to find the part corresponding to the tool call.
                # NOTE: This use case should not happen if the model is trained
                # correctly. It's a easy possible fix so it's included, but
                # can be brittle for very complex / highly nested tool calls
                raw_tool_call = self.tool_call_regex.findall(tool_content)[0]
                function_call_arr = json.loads(raw_tool_call)

with this:

            #First, use the tool call token to split, and we discard the first item, because it is empty
            raw_tool_calls = model_output.split(self.bot_token)[1:] 
            function_call_arr = []
            for raw_tool_call in raw_tool_calls:
                tool_name = raw_tool_call.split("{")[0]
                tool_arguments_begin = raw_tool_call.find("{")
                tool_arguments = raw_tool_call[tool_arguments_begin:]
                function_call_arr.append({
                                        "name": tool_name,
                                        "arguments": json.loads(tool_arguments)
                })

I've commented this also on #19425, but mentioning here also!

@avigny
Copy link

avigny commented Jul 8, 2025

@sjuxax
I've updated the tests for streaming extraction using the new format, in #19425
Feel free to cherrypick them if you want ;)


# Core streaming state
self.raw_tool_calls: str = ""
self.streaming_state: StreamingState = StreamingState.WAITING_FOR_TOOL_START
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix these ruff error?

sjuxax and others added 8 commits July 12, 2025 12:26
…sed tools

Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
…ex and JSON decoding

Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
… JSON corruption

Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
… and using offset-based parsing

Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
…mpatibility

Co-authored-by: aider (anthropic/claude-sonnet-4-20250514) <[email protected]>
@sjuxax
Copy link
Contributor Author

sjuxax commented Jul 12, 2025

Addressed Gemini's comments with Sonnet/Opus. I've been using these changes on my Mistral3.1-rebase branch with success for the last week or so.

@avigny, will take a look at your tests and probably pull them in in place of the Opus-autobuilt ones tomorrow.

@PedroMiolaSilva, thanks for persistently posting that snippet. I'll test and pull it in tomorrow too.

@hibukipanim
Copy link

Here is a snippet which reproduces some errors with Mistral Small 3.2 with commit b521f50

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "unit"]
        }
    }
}]

is_stream = False # <--- try also with True
out = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[{"role": "user", "content": "Where is colder tomorrow San Francisco or New York?"}],
    tools=tools,
    stream=is_stream,
    temperature=0,
)

if is_stream:
    for chunk in out:
        print(chunk.choices[0])
else:
    print(out.choices[0])

when I run it with is_stream=False than the tool-parser throws a JSONDecodeError.
when run with is_stream=True than it doesn't error but returns only the first tool (there are 2 here with temperature=0).
This snippet does work well with #19425 though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Bug]: Streaming tool call is not working for Mistral Small 3.2
5 participants