-
Notifications
You must be signed in to change notification settings - Fork 64
Add MCP (Model Context Protocol) integration for enhanced research #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Allow users to specify the research query in the configuration file as a fallback when not provided via CLI. This enables easier reuse of predefined queries for specific research scenarios. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add new MCP step that uses Anthropic's Claude with MCP tools to perform additional targeted searches via Exa - Implement MCPResult model and custom materializer for visualization - Update final report step to properly handle MCPResult objects - Add preprocessing for Pydantic objects in MCP prompts - Update README with MCP integration details and requirements - Add support for MCP-powered searches including research papers, companies, LinkedIn, Wikipedia, and GitHub The MCP step runs after reflection/approval and before final report generation, providing an additional layer of research depth using advanced search capabilities. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Corrected "mertic" to "metric" for clarity and accuracy in the code documentation. This change enhances the readability and maintainability of the code by ensuring that comments accurately reflect their intended meaning.
When replacing SambaNova models with Google ones, the model names were using incorrect format "google/gemini-*" instead of the correct LiteLLM format "openrouter/google/gemini-*" for OpenRouter routing. Changes: - Update all model defaults from "google/gemini-*" to "openrouter/google/gemini-*" - Fix provider validation in llm_utils.py to handle OpenRouter's nested format - Update comments to clarify correct naming conventions - Ensure all Google Gemini models use proper OpenRouter prefix This fixes the "LLM Provider NOT provided" error when using Google models. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new MCP (Model Context Protocol) integration step to the research pipeline, incorporates its results into report generation, and updates documentation and configs to support MCP.
- Introduces
MCPResult
model, materializer, andmcp_updates_step
to fetch and process Exa search results via Anthropic - Integrates
mcp_results
into all report-generation steps and the parallel pipeline - Updates configs, requirements, and README to document and enable MCP usage
Reviewed Changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
huggingface-sagemaker/utils/misc.py | Corrected a typo in the accuracy comment |
deep_research/utils/pydantic_models.py | Added MCPResult Pydantic model for MCP search results |
deep_research/utils/prompts.py | Defined MCP_PROMPT for the MCP step |
deep_research/utils/llm_utils.py | Expanded provider-prefix checks to handle OpenRouter/Gemini |
deep_research/steps/query_decomposition_step.py | Updated default LLM model to an OpenRouter Gemini variant |
deep_research/steps/pydantic_final_report_step.py | Added extract_mcp_content helper and injected mcp_results |
deep_research/steps/process_sub_question_step.py | Updated default LLM models to use OpenRouter Gemini |
deep_research/steps/mcp_step.py | Created the MCP-driven search step using Anthropic + Exa |
deep_research/steps/generate_reflection_step.py | Updated default LLM model to OpenRouter Gemini |
deep_research/steps/execute_approved_searches_step.py | Updated default LLM model to OpenRouter Gemini |
deep_research/steps/cross_viewpoint_step.py | Updated default LLM model to OpenRouter Gemini |
deep_research/run.py | Added fallback to config query and adjusted pipeline invocation |
deep_research/requirements.txt | Added anthropic dependency |
deep_research/pipelines/parallel_research_pipeline.py | Inserted mcp_updates_step into the pipeline flow |
deep_research/materializers/mcp_result_materializer.py | New materializer for visualizing MCPResult |
deep_research/materializers/init.py | Registered MCPResultMaterializer |
deep_research/configs/enhanced_research.yaml | Updated LLM model versions for various steps |
deep_research/README.md | Documented MCP integration and updated usage instructions |
README.md | Added “Deep Research” entry with MCP support to project overview |
Comments suppressed due to low confidence (5)
deep_research/run.py:318
- The fallback branch logs the absence of a query but never invokes the pipeline. Ensure you call
pipeline(...)
when no CLI or config query is provided.
else:
deep_research/steps/pydantic_final_report_step.py:195
- Docstring args are out of sync with the signature:
mcp_results
was added beforellm_model
. Update the Args section accordingly.
def generate_executive_summary(
deep_research/steps/pydantic_final_report_step.py:284
- Docstring is missing the new
mcp_results
parameter. Add it to the Args section to match the function signature.
def generate_introduction(
deep_research/steps/pydantic_final_report_step.py:37
- [nitpick] Add unit tests for
extract_mcp_content
covering cases where onlymcp_result
, onlyraw_mcp_result
, and neither are provided.
def extract_mcp_content(mcp_results: MCPResult) -> str:
deep_research/steps/pydantic_final_report_step.py:37
- The function uses html.escape but the html module is not imported. Add 'import html' at the top.
def extract_mcp_content(mcp_results: MCPResult) -> str:
Co-authored-by: Copilot <[email protected]>
Merging this to fix the images + to have the feature on |
Summary
Key Changes
MCP Search Capabilities
The MCP integration provides access to:
research_paper_search
: Academic paper and research contentcompany_research
: Company website crawling for business informationcompetitor_finder
: Find company competitorslinkedin_search
: Search LinkedIn for companies and peoplewikipedia_search_exa
: Wikipedia article retrievalgithub_search
: GitHub repositories and issuesRequirements
ANTHROPIC_API_KEY
: For accessing Claude with MCP capabilitiesEXA_API_KEY
: For the Exa search tools used by MCPTesting
The pipeline has been tested with the MCP integration and handles both successful searches and error cases gracefully.
🤖 Generated with Claude Code