Skip to content

Add MCP (Model Context Protocol) integration for enhanced research #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 4, 2025

Conversation

strickvl
Copy link
Contributor

@strickvl strickvl commented Jun 3, 2025

Summary

  • Adds a new MCP (Model Context Protocol) integration step to the research pipeline
  • Leverages Anthropic's Claude with MCP tools to perform additional targeted searches via Exa
  • Provides an extra layer of research depth with advanced search capabilities

Key Changes

  • New MCP step: Runs after reflection/approval and before final report generation
  • MCPResult model: New Pydantic model to handle MCP search results
  • Custom materializer: MCPResultMaterializer for visualizing both raw JSON and processed HTML results
  • Updated final report step: Properly handles MCPResult objects with preprocessing
  • Documentation: Comprehensive README updates explaining MCP integration and requirements

MCP Search Capabilities

The MCP integration provides access to:

  • research_paper_search: Academic paper and research content
  • company_research: Company website crawling for business information
  • competitor_finder: Find company competitors
  • linkedin_search: Search LinkedIn for companies and people
  • wikipedia_search_exa: Wikipedia article retrieval
  • github_search: GitHub repositories and issues

Requirements

  • ANTHROPIC_API_KEY: For accessing Claude with MCP capabilities
  • EXA_API_KEY: For the Exa search tools used by MCP

Testing

The pipeline has been tested with the MCP integration and handles both successful searches and error cases gracefully.

🤖 Generated with Claude Code

strickvl and others added 2 commits June 3, 2025 14:33
Allow users to specify the research query in the configuration file as a fallback when not provided via CLI. This enables easier reuse of predefined queries for specific research scenarios.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add new MCP step that uses Anthropic's Claude with MCP tools to perform additional targeted searches via Exa
- Implement MCPResult model and custom materializer for visualization
- Update final report step to properly handle MCPResult objects
- Add preprocessing for Pydantic objects in MCP prompts
- Update README with MCP integration details and requirements
- Add support for MCP-powered searches including research papers, companies, LinkedIn, Wikipedia, and GitHub

The MCP step runs after reflection/approval and before final report generation, providing an additional layer of research depth using advanced search capabilities.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@strickvl strickvl added enhancement New feature or request internal labels Jun 3, 2025
@strickvl strickvl requested a review from htahir1 June 3, 2025 15:52
Copy link

dagshub bot commented Jun 3, 2025

strickvl and others added 4 commits June 3, 2025 17:57
- Corrected "mertic" to "metric" for clarity and accuracy in the code documentation.

This change enhances the readability and maintainability of the code by ensuring that comments accurately reflect their intended meaning.
When replacing SambaNova models with Google ones, the model names were using
incorrect format "google/gemini-*" instead of the correct LiteLLM format
"openrouter/google/gemini-*" for OpenRouter routing.

Changes:
- Update all model defaults from "google/gemini-*" to "openrouter/google/gemini-*"
- Fix provider validation in llm_utils.py to handle OpenRouter's nested format
- Update comments to clarify correct naming conventions
- Ensure all Google Gemini models use proper OpenRouter prefix

This fixes the "LLM Provider NOT provided" error when using Google models.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@strickvl strickvl requested a review from Copilot June 4, 2025 15:09
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new MCP (Model Context Protocol) integration step to the research pipeline, incorporates its results into report generation, and updates documentation and configs to support MCP.

  • Introduces MCPResult model, materializer, and mcp_updates_step to fetch and process Exa search results via Anthropic
  • Integrates mcp_results into all report-generation steps and the parallel pipeline
  • Updates configs, requirements, and README to document and enable MCP usage

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
huggingface-sagemaker/utils/misc.py Corrected a typo in the accuracy comment
deep_research/utils/pydantic_models.py Added MCPResult Pydantic model for MCP search results
deep_research/utils/prompts.py Defined MCP_PROMPT for the MCP step
deep_research/utils/llm_utils.py Expanded provider-prefix checks to handle OpenRouter/Gemini
deep_research/steps/query_decomposition_step.py Updated default LLM model to an OpenRouter Gemini variant
deep_research/steps/pydantic_final_report_step.py Added extract_mcp_content helper and injected mcp_results
deep_research/steps/process_sub_question_step.py Updated default LLM models to use OpenRouter Gemini
deep_research/steps/mcp_step.py Created the MCP-driven search step using Anthropic + Exa
deep_research/steps/generate_reflection_step.py Updated default LLM model to OpenRouter Gemini
deep_research/steps/execute_approved_searches_step.py Updated default LLM model to OpenRouter Gemini
deep_research/steps/cross_viewpoint_step.py Updated default LLM model to OpenRouter Gemini
deep_research/run.py Added fallback to config query and adjusted pipeline invocation
deep_research/requirements.txt Added anthropic dependency
deep_research/pipelines/parallel_research_pipeline.py Inserted mcp_updates_step into the pipeline flow
deep_research/materializers/mcp_result_materializer.py New materializer for visualizing MCPResult
deep_research/materializers/init.py Registered MCPResultMaterializer
deep_research/configs/enhanced_research.yaml Updated LLM model versions for various steps
deep_research/README.md Documented MCP integration and updated usage instructions
README.md Added “Deep Research” entry with MCP support to project overview
Comments suppressed due to low confidence (5)

deep_research/run.py:318

  • The fallback branch logs the absence of a query but never invokes the pipeline. Ensure you call pipeline(...) when no CLI or config query is provided.
else:

deep_research/steps/pydantic_final_report_step.py:195

  • Docstring args are out of sync with the signature: mcp_results was added before llm_model. Update the Args section accordingly.
def generate_executive_summary(

deep_research/steps/pydantic_final_report_step.py:284

  • Docstring is missing the new mcp_results parameter. Add it to the Args section to match the function signature.
def generate_introduction(

deep_research/steps/pydantic_final_report_step.py:37

  • [nitpick] Add unit tests for extract_mcp_content covering cases where only mcp_result, only raw_mcp_result, and neither are provided.
def extract_mcp_content(mcp_results: MCPResult) -> str:

deep_research/steps/pydantic_final_report_step.py:37

  • The function uses html.escape but the html module is not imported. Add 'import html' at the top.
def extract_mcp_content(mcp_results: MCPResult) -> str:

@strickvl
Copy link
Contributor Author

strickvl commented Jun 4, 2025

Merging this to fix the images + to have the feature on main. Code is tested / working.

@strickvl strickvl merged commit 411d2fb into main Jun 4, 2025
5 checks passed
@strickvl strickvl deleted the feature/mcp-dr branch June 4, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant