Add MCP (Model Context Protocol) integration for enhanced research #230

strickvl · 2025-06-03T15:52:57Z

Summary

Adds a new MCP (Model Context Protocol) integration step to the research pipeline
Leverages Anthropic's Claude with MCP tools to perform additional targeted searches via Exa
Provides an extra layer of research depth with advanced search capabilities

Key Changes

New MCP step: Runs after reflection/approval and before final report generation
MCPResult model: New Pydantic model to handle MCP search results
Custom materializer: MCPResultMaterializer for visualizing both raw JSON and processed HTML results
Updated final report step: Properly handles MCPResult objects with preprocessing
Documentation: Comprehensive README updates explaining MCP integration and requirements

MCP Search Capabilities

The MCP integration provides access to:

research_paper_search: Academic paper and research content
company_research: Company website crawling for business information
competitor_finder: Find company competitors
linkedin_search: Search LinkedIn for companies and people
wikipedia_search_exa: Wikipedia article retrieval
github_search: GitHub repositories and issues

Requirements

ANTHROPIC_API_KEY: For accessing Claude with MCP capabilities
EXA_API_KEY: For the Exa search tools used by MCP

Testing

The pipeline has been tested with the MCP integration and handles both successful searches and error cases gracefully.

🤖 Generated with Claude Code

Allow users to specify the research query in the configuration file as a fallback when not provided via CLI. This enables easier reuse of predefined queries for specific research scenarios. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add new MCP step that uses Anthropic's Claude with MCP tools to perform additional targeted searches via Exa - Implement MCPResult model and custom materializer for visualization - Update final report step to properly handle MCPResult objects - Add preprocessing for Pydantic objects in MCP prompts - Update README with MCP integration details and requirements - Add support for MCP-powered searches including research papers, companies, LinkedIn, Wikipedia, and GitHub The MCP step runs after reflection/approval and before final report generation, providing an additional layer of research depth using advanced search capabilities. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

dagshub · 2025-06-03T15:53:00Z

Join the discussion on DagsHub!

- Corrected "mertic" to "metric" for clarity and accuracy in the code documentation. This change enhances the readability and maintainability of the code by ensuring that comments accurately reflect their intended meaning.

When replacing SambaNova models with Google ones, the model names were using incorrect format "google/gemini-*" instead of the correct LiteLLM format "openrouter/google/gemini-*" for OpenRouter routing. Changes: - Update all model defaults from "google/gemini-*" to "openrouter/google/gemini-*" - Fix provider validation in llm_utils.py to handle OpenRouter's nested format - Update comments to clarify correct naming conventions - Ensure all Google Gemini models use proper OpenRouter prefix This fixes the "LLM Provider NOT provided" error when using Google models. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Copilot

Pull Request Overview

Adds a new MCP (Model Context Protocol) integration step to the research pipeline, incorporates its results into report generation, and updates documentation and configs to support MCP.

Introduces MCPResult model, materializer, and mcp_updates_step to fetch and process Exa search results via Anthropic
Integrates mcp_results into all report-generation steps and the parallel pipeline
Updates configs, requirements, and README to document and enable MCP usage

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
huggingface-sagemaker/utils/misc.py	Corrected a typo in the accuracy comment
deep_research/utils/pydantic_models.py	Added `MCPResult` Pydantic model for MCP search results
deep_research/utils/prompts.py	Defined `MCP_PROMPT` for the MCP step
deep_research/utils/llm_utils.py	Expanded provider-prefix checks to handle OpenRouter/Gemini
deep_research/steps/query_decomposition_step.py	Updated default LLM model to an OpenRouter Gemini variant
deep_research/steps/pydantic_final_report_step.py	Added `extract_mcp_content` helper and injected `mcp_results`
deep_research/steps/process_sub_question_step.py	Updated default LLM models to use OpenRouter Gemini
deep_research/steps/mcp_step.py	Created the MCP-driven search step using Anthropic + Exa
deep_research/steps/generate_reflection_step.py	Updated default LLM model to OpenRouter Gemini
deep_research/steps/execute_approved_searches_step.py	Updated default LLM model to OpenRouter Gemini
deep_research/steps/cross_viewpoint_step.py	Updated default LLM model to OpenRouter Gemini
deep_research/run.py	Added fallback to config query and adjusted pipeline invocation
deep_research/requirements.txt	Added `anthropic` dependency
deep_research/pipelines/parallel_research_pipeline.py	Inserted `mcp_updates_step` into the pipeline flow
deep_research/materializers/mcp_result_materializer.py	New materializer for visualizing `MCPResult`
deep_research/materializers/init.py	Registered `MCPResultMaterializer`
deep_research/configs/enhanced_research.yaml	Updated LLM model versions for various steps
deep_research/README.md	Documented MCP integration and updated usage instructions
README.md	Added “Deep Research” entry with MCP support to project overview

Comments suppressed due to low confidence (5)

deep_research/run.py:318

The fallback branch logs the absence of a query but never invokes the pipeline. Ensure you call pipeline(...) when no CLI or config query is provided.

else:

deep_research/steps/pydantic_final_report_step.py:195

Docstring args are out of sync with the signature: mcp_results was added before llm_model. Update the Args section accordingly.

def generate_executive_summary(

deep_research/steps/pydantic_final_report_step.py:284

Docstring is missing the new mcp_results parameter. Add it to the Args section to match the function signature.

def generate_introduction(

deep_research/steps/pydantic_final_report_step.py:37

[nitpick] Add unit tests for extract_mcp_content covering cases where only mcp_result, only raw_mcp_result, and neither are provided.

def extract_mcp_content(mcp_results: MCPResult) -> str:

deep_research/steps/pydantic_final_report_step.py:37

The function uses html.escape but the html module is not imported. Add 'import html' at the top.

def extract_mcp_content(mcp_results: MCPResult) -> str:

deep_research/utils/pydantic_models.py

deep_research/steps/mcp_step.py

Co-authored-by: Copilot <[email protected]>

strickvl · 2025-06-04T16:09:36Z

Merging this to fix the images + to have the feature on main. Code is tested / working.

strickvl and others added 2 commits June 3, 2025 14:33

strickvl added enhancement New feature or request internal labels Jun 3, 2025

strickvl requested a review from htahir1 June 3, 2025 15:52

strickvl and others added 4 commits June 3, 2025 17:57

Fix README images

4fa7d32

Add Deep Research to main README

610f184

Fix typo in comment within compute_metrics function in misc.py

01f4b46

- Corrected "mertic" to "metric" for clarity and accuracy in the code documentation. This change enhances the readability and maintainability of the code by ensuring that comments accurately reflect their intended meaning.

strickvl requested a review from Copilot June 4, 2025 15:09

Copilot AI reviewed Jun 4, 2025

View reviewed changes

deep_research/utils/pydantic_models.py Outdated Show resolved Hide resolved

deep_research/steps/mcp_step.py Show resolved Hide resolved

Update deep_research/utils/pydantic_models.py

da0df2e

Co-authored-by: Copilot <[email protected]>

strickvl merged commit 411d2fb into main Jun 4, 2025
5 checks passed

strickvl deleted the feature/mcp-dr branch June 4, 2025 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MCP (Model Context Protocol) integration for enhanced research #230

Add MCP (Model Context Protocol) integration for enhanced research #230

Uh oh!

strickvl commented Jun 3, 2025

Uh oh!

dagshub bot commented Jun 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

strickvl commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Add MCP (Model Context Protocol) integration for enhanced research #230

Add MCP (Model Context Protocol) integration for enhanced research #230

Uh oh!

Conversation

strickvl commented Jun 3, 2025

Summary

Key Changes

MCP Search Capabilities

Requirements

Testing

Uh oh!

dagshub bot commented Jun 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

strickvl commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!