Skip to content

Add MCP (Model Context Protocol) integration for enhanced research #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ etc.
| [Gamesense](gamesense) | 🤖 LLMOps | 🧠 LoRA, ⚡ Efficient Training | pytorch, peft, phi-2 |
| [Nightwatch AI](nightwatch-ai) | 🤖 LLMOps | 📝 Summarization, 📊 Reporting | openai, supabase, slack |
| [ResearchRadar](research-radar) | 🤖 LLMOps | 📝 Classification, 📊 Comparison | anthropic, huggingface, transformers |
| [Deep Research](deep_research) | 🤖 LLMOps | 📝 Research, 📊 Reporting, 🔍 Web Search | anthropic, mcp, agents, openai |
| [End-to-end Computer Vision](end-to-end-computer-vision) | 👁 CV | 🔎 Object Detection, 🏷️ Labeling | pytorch, label_studio, yolov8 |
| [Magic Photobooth](magic-photobooth) | 👁 CV | 📷 Image Gen, 🎞️ Video Gen | stable-diffusion, huggingface |
| [OmniReader](omni-reader) | 👁 CV | 📑 OCR, 📊 Evaluation, ⚙️ Batch Processing | polars, litellm, openai, ollama |
Expand Down
48 changes: 40 additions & 8 deletions deep_research/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The ZenML Deep Research Agent is a scalable, modular pipeline that automates in-

- Creates a structured outline based on your research query
- Researches each section through targeted web searches and LLM analysis
- **NEW**: Performs additional MCP-powered searches using Anthropic's Model Context Protocol with Exa integration
- Iteratively refines content through reflection cycles
- Produces a comprehensive, well-formatted research report
- Visualizes the research process and report structure in the ZenML dashboard
Expand All @@ -24,7 +25,7 @@ This project transforms exploratory notebook-based research into a production-gr
The Deep Research Agent produces comprehensive, well-structured reports on any topic. Here's an example of research conducted on quantum computing:

<div align="center">
<img alt="Sample Research Report" src="assets/sample_report.png" width="70%">
<img alt="Sample Research Report" src="assets/sample_report.gif" width="70%">
<p><em>Sample report generated by the Deep Research Agent</em></p>
</div>

Expand All @@ -40,8 +41,9 @@ The pipeline uses a parallel processing architecture for efficiency and breaks d
6. **Reflection Generation**: Generate recommendations for improving research quality
7. **Human Approval** (optional): Get human approval for additional searches
8. **Execute Approved Searches**: Perform approved additional searches to fill gaps
9. **Final Report Generation**: Compile all synthesized information into a coherent HTML report
10. **Collect Tracing Metadata**: Gather comprehensive metrics about token usage, costs, and performance
9. **MCP-Powered Search**: Use Anthropic's Model Context Protocol to perform additional targeted searches via Exa
10. **Final Report Generation**: Compile all synthesized information into a coherent HTML report
11. **Collect Tracing Metadata**: Gather comprehensive metrics about token usage, costs, and performance

This architecture enables:
- Better reproducibility and caching of intermediate results
Expand All @@ -55,6 +57,7 @@ This architecture enables:

- **LLM Integration**: Uses litellm for flexible access to various LLM providers
- **Web Research**: Utilizes Tavily API for targeted internet searches
- **MCP Integration**: Leverages Anthropic's Model Context Protocol with Exa for enhanced research capabilities
- **ZenML Orchestration**: Manages pipeline flow, artifacts, and caching
- **Reproducibility**: Track every step, parameter, and output via ZenML
- **Visualizations**: Interactive visualizations of the research structure and progress
Expand All @@ -70,6 +73,8 @@ This architecture enables:
- ZenML installed and configured
- API key for your preferred LLM provider (configured with litellm)
- Tavily API key
- Anthropic API key (for MCP integration)
- Exa API key (for MCP-powered searches)
- Langfuse account for LLM tracking (optional but recommended)

### Installation
Expand All @@ -85,7 +90,8 @@ pip install -r requirements.txt
# Set up API keys
export OPENAI_API_KEY=your_openai_key # Or another LLM provider key
export TAVILY_API_KEY=your_tavily_key # For Tavily search (default)
export EXA_API_KEY=your_exa_key # For Exa search (optional)
export EXA_API_KEY=your_exa_key # For Exa search and MCP integration (required for MCP)
export ANTHROPIC_API_KEY=your_anthropic_key # For MCP integration (required)

# Set up Langfuse for LLM tracking (optional)
export LANGFUSE_PUBLIC_KEY=your_public_key
Expand Down Expand Up @@ -227,6 +233,31 @@ python run.py --num-results 5 # Get 5 results per sea
python run.py --num-results 10 --search-provider exa # 10 results with Exa
```

### MCP (Model Context Protocol) Integration

The pipeline includes a powerful MCP integration step that uses Anthropic's Model Context Protocol to perform additional targeted searches. This step runs after the reflection phase and before final report generation, providing an extra layer of research depth.

#### How MCP Works

The MCP step:
1. Receives the synthesized research data and analysis from previous steps
2. Uses Claude (via Anthropic API) with MCP tools to identify gaps or areas needing more research
3. Performs targeted searches using Exa's advanced search capabilities including:
- `research_paper_search`: Academic paper and research content
- `company_research`: Company website crawling for business information
- `competitor_finder`: Find company competitors
- `linkedin_search`: Search LinkedIn for companies and people
- `wikipedia_search_exa`: Wikipedia article retrieval
- `github_search`: GitHub repositories and issues

#### MCP Requirements

To use the MCP integration, you need:
- `ANTHROPIC_API_KEY`: For accessing Claude with MCP capabilities
- `EXA_API_KEY`: For the Exa search tools used by MCP

The MCP step uses Claude Sonnet 4.0 (claude-sonnet-4-20250514) which supports the MCP protocol.

### Search Providers

The pipeline supports multiple search providers for flexibility and comparison:
Expand Down Expand Up @@ -364,6 +395,7 @@ zenml_deep_research/
│ ├── execute_approved_searches_step.py # Execute approved searches
│ ├── generate_reflection_step.py # Generate reflection without execution
│ ├── iterative_reflection_step.py # Legacy combined reflection step
│ ├── mcp_step.py # MCP integration for additional searches
│ ├── merge_results_step.py
│ ├── process_sub_question_step.py
│ ├── pydantic_final_report_step.py
Expand Down Expand Up @@ -421,16 +453,16 @@ query: "Climate change policy debates"
steps:
initial_query_decomposition_step:
parameters:
llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
llm_model: "google/gemini-2.0-flash-lite-001"

cross_viewpoint_analysis_step:
parameters:
llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
llm_model: "google/gemini-2.0-flash-lite-001"
viewpoint_categories: ["scientific", "political", "economic", "social", "ethical", "historical"]

iterative_reflection_step:
parameters:
llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
llm_model: "google/gemini-2.0-flash-lite-001"
max_additional_searches: 2
num_results_per_search: 3

Expand All @@ -442,7 +474,7 @@ steps:

pydantic_final_report_step:
parameters:
llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
llm_model: "google/gemini-2.0-flash-lite-001"

# Environment settings
settings:
Expand Down
Binary file added deep_research/assets/pipeline_visualization.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added deep_research/assets/sample_report.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 5 additions & 5 deletions deep_research/configs/enhanced_research.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ langfuse_project_name: "deep-research"
steps:
initial_query_decomposition_step:
parameters:
llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
llm_model: "openrouter/google/gemini-2.5-flash-preview-05-20"

cross_viewpoint_analysis_step:
parameters:
llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
llm_model: "openrouter/google/gemini-2.5-flash-preview-05-20"
viewpoint_categories:
[
"scientific",
Expand All @@ -44,7 +44,7 @@ steps:

generate_reflection_step:
parameters:
llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
llm_model: "openrouter/google/gemini-2.5-flash-preview-05-20"

get_research_approval_step:
parameters:
Expand All @@ -53,11 +53,11 @@ steps:

execute_approved_searches_step:
parameters:
llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
llm_model: "openrouter/google/gemini-2.5-flash-preview-05-20"

pydantic_final_report_step:
parameters:
llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
llm_model: "openrouter/google/gemini-2.5-flash-preview-05-20"

# Environment settings
settings:
Expand Down
2 changes: 2 additions & 0 deletions deep_research/materializers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from .analysis_data_materializer import AnalysisDataMaterializer
from .approval_decision_materializer import ApprovalDecisionMaterializer
from .final_report_materializer import FinalReportMaterializer
from .mcp_result_materializer import MCPResultMaterializer
from .prompt_materializer import PromptMaterializer
from .query_context_materializer import QueryContextMaterializer
from .search_data_materializer import SearchDataMaterializer
Expand All @@ -23,4 +24,5 @@
"SynthesisDataMaterializer",
"AnalysisDataMaterializer",
"FinalReportMaterializer",
"MCPResultMaterializer",
]
Loading