-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Fix cache hit rate by making MCP tools order deterministic #2611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
Or should we switch to |
@bolinfest I agree. Switching to IndexMap would make the intent clearer and ensure the order is preserved. The change is a bit broader than my current patch. Would that be okay? |
@warpdev @bolinfest given the impact here, I think it would be reasonable to land the fix with the current sort implementation and leave 2 things as follow-ups:
|
Ok I'll drive the IndexMap change after...thanks for taking the initiative on this! |
@bolinfest I'm happy to take on the IndexMap change! I did suggest this approach, after all 😅 |
Also documenting that I tested this change with mcp servers enabled - can reproduce the issue and confirmed the cache hit rate is back up on this branch. |
Fixes #2610
This PR sorts the tools in
get_openai_tools
by name to ensure a consistent MCP tool order.Currently, MCP servers are stored in a HashMap, which does not guarantee ordering. As a result, the tool order changes across turns, effectively breaking prompt caching in multi-turn sessions.
An alternative solution would be to replace the HashMap with an ordered structure, but that would require a much larger code change. Given that it is unrealistic to have so many MCP tools that sorting would cause performance issues, this lightweight fix is chosen instead.
By ensuring deterministic tool order, this change should significantly improve cache hit rates and prevent users from hitting usage limits too quickly. (For reference, my own sessions last week reached the limit unusually fast, with cache hit rates falling below 1%.)
Result
After this fix, sessions with MCP servers now show caching behavior almost identical to sessions without MCP servers.