Skip to content

Conversation

warpdev
Copy link
Contributor

@warpdev warpdev commented Aug 23, 2025

Fixes #2610

This PR sorts the tools in get_openai_tools by name to ensure a consistent MCP tool order.

Currently, MCP servers are stored in a HashMap, which does not guarantee ordering. As a result, the tool order changes across turns, effectively breaking prompt caching in multi-turn sessions.

An alternative solution would be to replace the HashMap with an ordered structure, but that would require a much larger code change. Given that it is unrealistic to have so many MCP tools that sorting would cause performance issues, this lightweight fix is chosen instead.

By ensuring deterministic tool order, this change should significantly improve cache hit rates and prevent users from hitting usage limits too quickly. (For reference, my own sessions last week reached the limit unusually fast, with cache hit rates falling below 1%.)

Result

After this fix, sessions with MCP servers now show caching behavior almost identical to sessions without MCP servers.

Without MCP With MCP
image image

Copy link

github-actions bot commented Aug 23, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@warpdev
Copy link
Contributor Author

warpdev commented Aug 23, 2025

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Aug 23, 2025
@warpdev warpdev changed the title Make MCP tools order deterministic to cache input tokens Fix cache hit rate by making MCP tools order deterministic Aug 23, 2025
@bolinfest
Copy link
Collaborator

Or should we switch to IndexMap so the order is consistent, but matches the order users list them in in config.toml (assuming our TOML parser preserves order...).

@warpdev
Copy link
Contributor Author

warpdev commented Aug 25, 2025

@bolinfest I agree. Switching to IndexMap would make the intent clearer and ensure the order is preserved. The change is a bit broader than my current patch. Would that be okay?

@dylan-hurd-oai
Copy link
Collaborator

@warpdev @bolinfest given the impact here, I think it would be reasonable to land the fix with the current sort implementation and leave 2 things as follow-ups:

  1. the IndexMap change
  2. adding an integration test to prompt_caching.rs to test this end to end and ensure any serialization changes don't impact our cache hit rate

@bolinfest
Copy link
Collaborator

Ok I'll drive the IndexMap change after...thanks for taking the initiative on this!

@bolinfest bolinfest self-assigned this Aug 25, 2025
@bolinfest bolinfest self-requested a review August 25, 2025 02:29
@dylan-hurd-oai
Copy link
Collaborator

@bolinfest I'm happy to take on the IndexMap change! I did suggest this approach, after all 😅

@dylan-hurd-oai
Copy link
Collaborator

Also documenting that I tested this change with mcp servers enabled - can reproduce the issue and confirmed the cache hit rate is back up on this branch.

@bolinfest bolinfest removed their assignment Aug 25, 2025
@bolinfest bolinfest merged commit ee2ccb5 into openai:main Aug 25, 2025
15 of 18 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Token cache rate drops drastically when using more than one MCP tool
4 participants