Skip to content

Docs: add an example of using RunContext to pass data among tools #2316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions docs/examples/data-analyst.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Data Analyst

Sometimes in an agent workflow, the agent does not need to know the exact tool
output, but still needs to process the tool output in some ways. This is
especially common in data analytics: the agent needs to know that the result of a
query tool is a `DataFrame` with certain named columns, but not
necessarily the content of every single row.

With Pydantic AI, you can use a [dependencies object](../dependencies.md) to
store the result from one tool and use it in another tool.

In this example, we'll build an agent that analyzes the [Rotten Tomatoes movie review dataset from Cornell](https://huggingface.co/datasets/cornell-movie-review-data/rotten_tomatoes).


Demonstrates:

- [agent dependencies](../dependencies.md)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pydantic/pydantic-ai/actions/runs/16605150201/job/46974951928#step:9:46

Run uv run --no-project --with httpx .github/set_docs_pr_preview_url.py
Installed 8 packages in 5ms
Docs preview URL: https://91de4635-pydantic-ai-previews.pydantic.workers.dev/
Pull request number not set
Error: Process completed with exit code 1.

https://91de4635-pydantic-ai-previews.pydantic.workers.dev/



## Running the Example

With [dependencies installed and environment variables set](./index.md#usage), run:

```bash
python/uv-run -m pydantic_ai_examples.data_analyst
```


Output (debug):


> Based on my analysis of the Cornell Movie Review dataset (rotten_tomatoes), there are **4,265 negative comments** in the training split. These are the reviews labeled as 'neg' (represented by 0 in the dataset).



## Example Code

```snippet {path="/examples/pydantic_ai_examples/data_analyst.py"}```


## Appendix

### Choosing a Model

This example requires using a model that understands DuckDB SQL. You can check with `clai`:

```sh
> clai -m bedrock:us.anthropic.claude-3-7-sonnet-20250219-v1:0
clai - Pydantic AI CLI v0.0.1.dev920+41dd069 with bedrock:us.anthropic.claude-3-7-sonnet-20250219-v1:0
clai ➤ do you understand duckdb sql?
# DuckDB SQL

Yes, I understand DuckDB SQL. DuckDB is an in-process analytical SQL database
that uses syntax similar to PostgreSQL. It specializes in analytical queries
and is designed for high-performance analysis of structured data.

Some key features of DuckDB SQL include:

• OLAP (Online Analytical Processing) optimized
• Columnar-vectorized query execution
• Standard SQL support with PostgreSQL compatibility
• Support for complex analytical queries
• Efficient handling of CSV/Parquet/JSON files

I can help you with DuckDB SQL queries, schema design, optimization, or other
DuckDB-related questions.
```
2 changes: 1 addition & 1 deletion docs/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ There are a number of ways to register tools with an agent:
- via the [`@agent.tool_plain`][pydantic_ai.Agent.tool_plain] decorator — for tools that do not need access to the agent [context][pydantic_ai.tools.RunContext]
- via the [`tools`][pydantic_ai.Agent.__init__] keyword argument to `Agent` which can take either plain functions, or instances of [`Tool`][pydantic_ai.tools.Tool]

For more advanced use cases, the [toolsets](toolsets.md) feature lets you manage collections of tools (built by you or providd by an [MCP server](mcp/client.md) or other [third party](#third-party-tools)) and register them with an agent in one go via the [`toolsets`][pydantic_ai.Agent.__init__] keyword argument to `Agent`.
For more advanced use cases, the [toolsets](toolsets.md) feature lets you manage collections of tools (built by you or provided by an [MCP server](mcp/client.md) or other [third party](#third-party-tools)) and register them with an agent in one go via the [`toolsets`][pydantic_ai.Agent.__init__] keyword argument to `Agent`.

!!! info "Function tools vs. RAG"
Function tools are basically the "R" of RAG (Retrieval-Augmented Generation) — they augment what the model can do by letting it request extra information.
Expand Down
107 changes: 107 additions & 0 deletions examples/pydantic_ai_examples/data_analyst.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
from dataclasses import dataclass, field

import datasets
import duckdb
import pandas as pd

from pydantic_ai import Agent, ModelRetry, RunContext


@dataclass
class AnalystAgentDeps:
output: dict[str, pd.DataFrame] = field(default_factory=dict)

def store(self, value: pd.DataFrame) -> str:
"""Store the output in deps and return the reference such as Out[1] to be used by the LLM."""
ref = f'Out[{len(self.output) + 1}]'
self.output[ref] = value
return ref

def get(self, ref: str) -> pd.DataFrame:
if ref not in self.output:
raise ModelRetry(
f'Error: {ref} is not a valid variable reference. Check the previous messages and try again.'
)
return self.output[ref]


analyst_agent = Agent(
'openai:gpt-4o',
deps_type=AnalystAgentDeps,
instructions='You are a data analyst and your job is to analyze the data according to the user request.',
)


@analyst_agent.tool
def load_dataset(
ctx: RunContext[AnalystAgentDeps],
path: str,
split: str = 'train',
) -> str:
"""Load the `split` of dataset `dataset_name` from huggingface.

Args:
ctx: Pydantic AI agent RunContext
path: name of the dataset in the form of `<user_name>/<dataset_name>`
split: load the split of the dataset (default: "train")
"""
# begin load data from hf
builder = datasets.load_dataset_builder(path) # pyright: ignore[reportUnknownMemberType]
splits: dict[str, datasets.SplitInfo] = builder.info.splits or {} # pyright: ignore[reportUnknownMemberType]
if split not in splits:
raise ModelRetry(
f'{split} is not valid for dataset {path}. Valid splits are {",".join(splits.keys())}'
)

builder.download_and_prepare() # pyright: ignore[reportUnknownMemberType]
dataset = builder.as_dataset(split=split)
assert isinstance(dataset, datasets.Dataset)
dataframe = dataset.to_pandas()
assert isinstance(dataframe, pd.DataFrame)
# end load data from hf

# store the dataframe in the deps and get a ref like "Out[1]"
ref = ctx.deps.store(dataframe)
# construct a summary of the loaded dataset
output = [
f'Loaded the dataset as `{ref}`.',
f'Description: {dataset.info.description}'
if dataset.info.description
else None,
f'Features: {dataset.info.features!r}' if dataset.info.features else None,
]
return '\n'.join(filter(None, output))


@analyst_agent.tool
def run_duckdb(ctx: RunContext[AnalystAgentDeps], dataset: str, sql: str) -> str:
"""Run DuckDB SQL query on the DataFrame.

Note that the virtual table name used in DuckDB SQL must be `dataset`.

Args:
ctx: Pydantic AI agent RunContext
dataset: reference string to the DataFrame
sql: the query to be executed using DuckDB
"""
data = ctx.deps.get(dataset)
result = duckdb.query_df(df=data, virtual_table_name='dataset', sql_query=sql)
# pass the result as ref (because DuckDB SQL can select many rows, creating another huge dataframe)
ref = ctx.deps.store(result.df()) # pyright: ignore[reportUnknownMemberType]
return f'Executed SQL, result is `{ref}`'


@analyst_agent.tool
def display(ctx: RunContext[AnalystAgentDeps], name: str) -> str:
"""Display at most 5 rows of the dataframe."""
dataset = ctx.deps.get(name)
return dataset.head().to_string() # pyright: ignore[reportUnknownMemberType]


if __name__ == '__main__':
deps = AnalystAgentDeps()
result = analyst_agent.run_sync(
user_prompt='Count how many negative comments are there in the dataset `cornell-movie-review-data/rotten_tomatoes`',
deps=deps,
)
print(result.output)
3 changes: 3 additions & 0 deletions examples/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ dependencies = [
"gradio>=5.9.0; python_version>'3.9'",
"mcp[cli]>=1.4.1; python_version >= '3.10'",
"modal>=1.0.4",
"duckdb>=1.3.2",
"datasets>=4.0.0",
"pandas>=2.2.3",
]

[tool.hatch.build.targets.wheel]
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ nav:
- examples/chat-app.md
- examples/question-graph.md
- examples/slack-lead-qualifier.md
- examples/data-analyst.md

- API Reference:
- api/ag_ui.md
Expand Down
Loading