Skip to content

[Frontend] Expand tools even if tool_choice="none" #17177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 1, 2025

Conversation

okdshin
Copy link
Contributor

@okdshin okdshin commented Apr 25, 2025

Add option to include tool definitions even when tool_choice is 'none'

Summary

This PR adds a new command-line option --expand-tools-even-if-tool-choice-none which allows including tool definitions in prompts even when tool_choice='none'.

Motivation

In the current implementation, when tool_choice is set to 'none', all tool definitions are removed from the request, preventing the model from seeing the tool schemas. This change enables a workflow where:

  1. The model can be aware of available tools (via their definitions in the prompt)
  2. But is not expected to use them automatically (since tool_choice='none')

This is useful for:

  • Models that need to "plan" about available tools before using them in subsequent requests
  • Cases where tool descriptions provide useful context even if the tools aren't used
  • Situations where you want the model to reference tool capabilities in its response without actually calling them

Implementation

  • Added a new CLI flag --expand-tools-even-if-tool-choice-none (default: False)
  • Modified the request validation in protocol.py to no longer remove tools when tool_choice='none'
  • Added the new parameter to OpenAIServingChat and passed it through from the API server

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the frontend label Apr 25, 2025
@okdshin okdshin force-pushed the specifiable_tools_expansion branch from d0de334 to d272dbe Compare June 13, 2025 15:30
@okdshin okdshin requested a review from aarnphm as a code owner June 13, 2025 15:30
@okdshin okdshin force-pushed the specifiable_tools_expansion branch 4 times, most recently from ef2bf5f to 25aec8d Compare June 14, 2025 13:29
@okdshin
Copy link
Contributor Author

okdshin commented Jun 16, 2025

Hi @aarnphm, I've fixed the CI issues and updated this PR. The tests are now passing - could you please take another look when you have a chance? Thanks!

@mgoin mgoin requested a review from russellb June 16, 2025 23:01
Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an use-case that requires the tool definition even if tool_choice="none"?

From a high level POV, having tool definition when I don't want to use tool seems like a waste of context for small/medium size model

@russellb
Copy link
Member

Is the use case that you'd like to allow tool calls to be (potentially) generated, so you want all of the tool call definitions included, but you don't want the API server trying to parse any of it if they are there?

I'm just trying to make sure I understand properly..

@russellb
Copy link
Member

I'm not a big fan of the new option, but honestly, the new behavior makes more sense to me. If tools have been defined in a request, it seems like including that data makes sense. Whether it's included or not seems implied by whether they're included in the request.

@okdshin
Copy link
Contributor Author

okdshin commented Jun 17, 2025

Currently we are achieving tool_choice="auto" behavior by sending two requests from reverse proxy server to vLLM's API server:

  • tool_choice="none" and guided_json={list_of_one_of_signature_of_tools}
  • tool_choice="none"

If the first response was an empty json list, we assume the tool use was unnecessary and just use the second response. This is required because we want to use guided decoding (instead of relying on tool_parser) to enforce that the function call strictly aligns with the expected signature.

@aarnphm
Copy link
Collaborator

aarnphm commented Jun 17, 2025

fyi 'tool_choice="required"' uses structured outputs here.

But I think I understand this case.

@okdshin
Copy link
Contributor Author

okdshin commented Jun 17, 2025

I'm not a big fan of the new option,

Honestly, I agree with @russellb . Instead of adding a new option, could we just change the default behavior to be more consistent by always including tool definitions regardless of tool_choice setting?

fyi 'tool_choice="required"' uses structured outputs here.

Yes. required has already supported generally and has almost same behaviour when response is not empty list and it must be. auto is more complicated to implement properly.

@okdshin
Copy link
Contributor Author

okdshin commented Jun 17, 2025

I found the discussion about why tools are dropped when tool_choice="none": #10000 (comment)
Dropping tools seems like overkill and wrong because it creates inconsistency with other modes (named-function, required and auto).
The modification here should be sufficient to skip tool parsing when the model hallucinates:
https://github.com/vllm-project/vllm/pull/10000/files#diff-190c665c438d34a7190da9a4d9bc1ed24bed8b13ee1b3f20c6da5c8aa52b0f3bR475-R476 .

@russellb
Copy link
Member

Currently we are achieving tool_choice="auto" behavior by sending two requests from reverse proxy server to vLLM's API server:

  • tool_choice="none" and guided_json={list_of_one_of_signature_of_tools}
  • tool_choice="none"

If the first response was an empty json list, we assume the tool use was unnecessary and just use the second response. This is required because we want to use guided decoding (instead of relying on tool_parser) to enforce that the function call strictly aligns with the expected signature.

The newer structural_tag response format is designed to do this. You can express the requirements for all of the tool calls and it will enforce correct formatting whenever they occur in the middle of otherwise freeform text.

There's an example in here:

"structural_tag": {
"messages": [
{
"role": "user",
"content": """
You have access to the following function to retrieve the weather in a city:
{
"name": "get_weather",
"parameters": {
"city": {
"param_type": "string",
"description": "The city to get the weather for",
"required": True
}
}
}
If a you choose to call a function ONLY reply in the following format:
<{start_tag}={function_name}>{parameters}{end_tag}
where
start_tag => `<function`
parameters => a JSON dict with the function argument name as key and function
argument value as value.
end_tag => `</function>`
Here is an example,
<function=example_function_name>{"example_name": "example_value"}</function>
Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line
- Always add your sources when using search results to answer the user query
You are a helpful assistant.
Given the previous instructions, what is the weather in New York City, Boston,
and San Francisco?""",
},
],
"response_format": {
"type": "structural_tag",
"structures": [
{
"begin": "<function=get_weather>",
"schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
"end": "</function>",
}
],
"triggers": ["<function="],
},
},
}

You can see in the example that the tool definition had to be manually included in the prompt. What you're proposing here, that it can (and should) be automatic, makes sense to me.

I think changing the default behavior, plus using the structural_tag response format, would probably give you the behavior you're looking for. Would you agree?

@okdshin
Copy link
Contributor Author

okdshin commented Jun 17, 2025

Yes. I will modify this PR to remove the new option and change the default behaviour. I will check the structural_tag response format.
However, considering combinations with other response formats like <think>...(not json text)...</think> for reasoning feature, I think supporting something like llguidance's Lark format (guided_lark?) would ultimately provide the highest degree of freedom for users. But that's a separate discussion.

@okdshin okdshin changed the title Add option "--expand-tools-even-if-tool-choice-none" Expand tools even if tool_choice="none" Jun 17, 2025
@okdshin
Copy link
Contributor Author

okdshin commented Jun 17, 2025

I modified. Could you take another look?

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change, though I'll hold off on merging to give others more time to express an opinion

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I afraid that this is breaking, given that requests relying on the prior behaviour would start breaking if the set of prompts + user message exceed the max context length of the model (unless long context like YaRN are being used).

Intuitively, tool_choice="none" implies that this request won't include any tool definitions. The current behaviour also aligns with OpenAI's logic for tool_choices="none" (see here)

none means the model will not call any tool and instead generates a message.

This doesn't mean that we have to strictly follow what OpenAI does. Just that if the goal is to be compatible, then I don't think this behaviour makes sense.

@aarnphm aarnphm changed the title Expand tools even if tool_choice="none" [Frontend] Expand tools even if tool_choice="none" Jun 18, 2025
@okdshin
Copy link
Contributor Author

okdshin commented Jun 18, 2025

What about changing the default as proposed, but adding --exclude-tools-when-none-choice for users who need the old behavior for context optimization?
This would restore the original behavior (before November) while providing an opt-out for users who prefer the current approach. The key asymmetry is that users wanting to avoid prompt tokens can send empty tools, but users wanting guided decoding with tool context currently have no automatic way to include tool definitions (the structural_tag example shows manual inclusion in the user message). This change would resolve that limitation while maintaining flexibility for both use cases.

@aarnphm
Copy link
Collaborator

aarnphm commented Jun 19, 2025

opt-out is breaking right? For any production deployment. I think we should make this opt-in, then follow deprecating policy fwiw then break it in 0.10 at least.

@simon-mo do we have plans to release any patches before 0.10?

I'm fine with making this the default in 0.10, given that 0.10 will be considered breaking regardless.

@okdshin
Copy link
Contributor Author

okdshin commented Jun 19, 2025

@aarnphm You're right about the breaking change concern. Let me revert commit 377f4ac to restore the opt-in approach with the --expand-tools-even-if-tool-choice-none flag.
This will allow users who need the new behavior to opt-in while keeping existing deployments unaffected.

Regarding the transition to 0.10 - could you help me understand the process for:
What's the recommended way to add deprecation warnings for this kind of behavioral change?
Are there any specific documentation or migration guide requirements we should prepare?

okdshin added 8 commits June 21, 2025 23:59
Signed-off-by: okada shintarou <[email protected]>
Signed-off-by: okada shintarou <[email protected]>
Signed-off-by: okada shintarou <[email protected]>
Signed-off-by: okada shintarou <[email protected]>
Signed-off-by: okada shintarou <[email protected]>
@okdshin okdshin force-pushed the specifiable_tools_expansion branch from 9559897 to e282ebb Compare June 21, 2025 15:00
@okdshin
Copy link
Contributor Author

okdshin commented Jun 21, 2025

@aarnphm @russellb I've addressed the review feedback by adding the deprecation warning 1ffd7f0 , updating the help text e3733a9 , and documenting the behavior change e282ebb . Could you take another look?

okdshin added 2 commits June 22, 2025 00:14
Signed-off-by: okada shintarou <[email protected]>
Signed-off-by: okada shintarou <[email protected]>
@aarnphm aarnphm requested review from aarnphm and russellb June 22, 2025 03:50
tool_dicts = None
elif (request.tool_choice == "none"
and not self.expand_tools_even_if_tool_choice_none):
assert request.tools is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use assert in performance path here, if this is mostly for types the we can gate it in TYPE_CHECKING.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarnphm Thanks for the review! I understand the performance concern, but I'd like to keep the assert here for a specific reason.

This assert serves as a defensive programming guard rather than just type checking. The logic is:

  1. First condition: if request.tools is Nonetool_dicts = None
  2. Second condition: elif (request.tool_choice == "none" and not self.expand_tools_even_if_tool_choice_none)

The assert ensures that if someone modifies the first condition in the future (e.g., adds another OR condition), we'll catch the logic error immediately with a clear AssertionError, rather than getting a confusing AttributeError: 'NoneType' object has no attribute '__len__' when we call len(request.tools) below.

While I understand the performance concern, in the context of vLLM's request processing pipeline, this single assertion check is dwarfed by the actual bottlenecks like model inference, GPU operations, and network I/O. The cost of one conditional check per request is negligible compared to the milliseconds/seconds spent on actual LLM processing.

Given that trade-off, I think the defensive programming benefit outweighs the minimal performance cost. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarnphm You're absolutely right. Looking at this again, adding TYPE_CHECKING import just for this assert would be overkill, and the assert itself isn't really necessary here. Let me remove it and keep the code simple. Thanks for the guidance!

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this, can you look at the PR failure? seems like it is relevant to this PR.

Co-authored-by: Aaron Pham <[email protected]>
Signed-off-by: okada shintarou <[email protected]>
@okdshin okdshin force-pushed the specifiable_tools_expansion branch from 8750191 to 90eabb2 Compare June 22, 2025 07:56
Signed-off-by: okada shintarou <[email protected]>
@okdshin
Copy link
Contributor Author

okdshin commented Jun 22, 2025

@aarnphm @russellb I've addressed the review feedback. I have no idea why CI failed, but there seems no failure now. I also tested tests/entrypoints/openai/test_completion_with_prompt_embeds.py on my local environment and ensured it passed. Could you take another look?

@aarnphm aarnphm added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 22, 2025
Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, stamp from me, waiting for @russellb

@okdshin
Copy link
Contributor Author

okdshin commented Jun 27, 2025

Hi @russellb,

Could you please take a look at the latest changes? The PR has been updated based on your previous suggestions about following the deprecation process. I believe this is now ready for your review.
If anything else needs to be fixed, just let me know - happy to make any additional changes.

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies for the delay. looks good, thank you!

@russellb russellb merged commit 3d19d47 into vllm-project:main Jul 1, 2025
78 checks passed
CSWYF3634076 pushed a commit to CSWYF3634076/vllm that referenced this pull request Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed tool-calling
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants