-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Docs: add an example of using RunContext to pass data among tools #2316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+504
−6
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
d00d80c
Add example of using RunContext to share data among tools
tonyxwz c58fbbc
Fix spell in docs
tonyxwz 95cfaa6
Apply suggestions from code review
tonyxwz 5fea8bd
docs(example): move the example output next to the run command
tonyxwz 2f1021f
address comments of DouweM in the data analyst example
tonyxwz 9c425af
fix: surround markdown list with new line
tonyxwz d6ddec7
fix: use just print, change example output to quote
tonyxwz ae9943e
fix: address more comments
tonyxwz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Data Analyst | ||
|
||
Sometimes in an agent workflow, the agent does not need to know the exact tool | ||
output, but still needs to process the tool output in some ways. This is | ||
especially common in data analytics: the agent needs to know that the result of a | ||
query tool is a `DataFrame` with certain named columns, but not | ||
necessarily the content of every single row. | ||
|
||
With Pydantic AI, you can use a [dependencies object](../dependencies.md) to | ||
store the result from one tool and use it in another tool. | ||
|
||
In this example, we'll build an agent that analyzes the [Rotten Tomatoes movie review dataset from Cornell](https://huggingface.co/datasets/cornell-movie-review-data/rotten_tomatoes). | ||
|
||
|
||
Demonstrates: | ||
|
||
- [agent dependencies](../dependencies.md) | ||
|
||
|
||
## Running the Example | ||
|
||
tonyxwz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
With [dependencies installed and environment variables set](./index.md#usage), run: | ||
|
||
```bash | ||
python/uv-run -m pydantic_ai_examples.data_analyst | ||
``` | ||
|
||
DouweM marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Output (debug): | ||
|
||
|
||
> Based on my analysis of the Cornell Movie Review dataset (rotten_tomatoes), there are **4,265 negative comments** in the training split. These are the reviews labeled as 'neg' (represented by 0 in the dataset). | ||
|
||
|
||
|
||
## Example Code | ||
|
||
```snippet {path="/examples/pydantic_ai_examples/data_analyst.py"}``` | ||
|
||
|
||
## Appendix | ||
|
||
### Choosing a Model | ||
|
||
This example requires using a model that understands DuckDB SQL. You can check with `clai`: | ||
|
||
```sh | ||
> clai -m bedrock:us.anthropic.claude-3-7-sonnet-20250219-v1:0 | ||
clai - Pydantic AI CLI v0.0.1.dev920+41dd069 with bedrock:us.anthropic.claude-3-7-sonnet-20250219-v1:0 | ||
clai ➤ do you understand duckdb sql? | ||
# DuckDB SQL | ||
|
||
Yes, I understand DuckDB SQL. DuckDB is an in-process analytical SQL database | ||
that uses syntax similar to PostgreSQL. It specializes in analytical queries | ||
and is designed for high-performance analysis of structured data. | ||
|
||
Some key features of DuckDB SQL include: | ||
|
||
• OLAP (Online Analytical Processing) optimized | ||
• Columnar-vectorized query execution | ||
• Standard SQL support with PostgreSQL compatibility | ||
• Support for complex analytical queries | ||
• Efficient handling of CSV/Parquet/JSON files | ||
|
||
I can help you with DuckDB SQL queries, schema design, optimization, or other | ||
DuckDB-related questions. | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
from dataclasses import dataclass, field | ||
|
||
import datasets | ||
import duckdb | ||
import pandas as pd | ||
|
||
from pydantic_ai import Agent, ModelRetry, RunContext | ||
|
||
|
||
@dataclass | ||
class AnalystAgentDeps: | ||
output: dict[str, pd.DataFrame] = field(default_factory=dict) | ||
|
||
def store(self, value: pd.DataFrame) -> str: | ||
"""Store the output in deps and return the reference such as Out[1] to be used by the LLM.""" | ||
ref = f'Out[{len(self.output) + 1}]' | ||
self.output[ref] = value | ||
return ref | ||
|
||
def get(self, ref: str) -> pd.DataFrame: | ||
if ref not in self.output: | ||
raise ModelRetry( | ||
f'Error: {ref} is not a valid variable reference. Check the previous messages and try again.' | ||
) | ||
return self.output[ref] | ||
|
||
|
||
analyst_agent = Agent( | ||
'openai:gpt-4o', | ||
deps_type=AnalystAgentDeps, | ||
instructions='You are a data analyst and your job is to analyze the data according to the user request.', | ||
) | ||
|
||
|
||
@analyst_agent.tool | ||
def load_dataset( | ||
ctx: RunContext[AnalystAgentDeps], | ||
path: str, | ||
split: str = 'train', | ||
) -> str: | ||
"""Load the `split` of dataset `dataset_name` from huggingface. | ||
|
||
Args: | ||
ctx: Pydantic AI agent RunContext | ||
path: name of the dataset in the form of `<user_name>/<dataset_name>` | ||
split: load the split of the dataset (default: "train") | ||
""" | ||
# begin load data from hf | ||
builder = datasets.load_dataset_builder(path) # pyright: ignore[reportUnknownMemberType] | ||
splits: dict[str, datasets.SplitInfo] = builder.info.splits or {} # pyright: ignore[reportUnknownMemberType] | ||
if split not in splits: | ||
raise ModelRetry( | ||
f'{split} is not valid for dataset {path}. Valid splits are {",".join(splits.keys())}' | ||
) | ||
|
||
builder.download_and_prepare() # pyright: ignore[reportUnknownMemberType] | ||
dataset = builder.as_dataset(split=split) | ||
assert isinstance(dataset, datasets.Dataset) | ||
dataframe = dataset.to_pandas() | ||
assert isinstance(dataframe, pd.DataFrame) | ||
# end load data from hf | ||
|
||
# store the dataframe in the deps and get a ref like "Out[1]" | ||
ref = ctx.deps.store(dataframe) | ||
# construct a summary of the loaded dataset | ||
output = [ | ||
f'Loaded the dataset as `{ref}`.', | ||
f'Description: {dataset.info.description}' | ||
if dataset.info.description | ||
else None, | ||
f'Features: {dataset.info.features!r}' if dataset.info.features else None, | ||
] | ||
return '\n'.join(filter(None, output)) | ||
|
||
|
||
@analyst_agent.tool | ||
def run_duckdb(ctx: RunContext[AnalystAgentDeps], dataset: str, sql: str) -> str: | ||
"""Run DuckDB SQL query on the DataFrame. | ||
|
||
Note that the virtual table name used in DuckDB SQL must be `dataset`. | ||
|
||
Args: | ||
ctx: Pydantic AI agent RunContext | ||
dataset: reference string to the DataFrame | ||
sql: the query to be executed using DuckDB | ||
""" | ||
data = ctx.deps.get(dataset) | ||
result = duckdb.query_df(df=data, virtual_table_name='dataset', sql_query=sql) | ||
# pass the result as ref (because DuckDB SQL can select many rows, creating another huge dataframe) | ||
ref = ctx.deps.store(result.df()) # pyright: ignore[reportUnknownMemberType] | ||
return f'Executed SQL, result is `{ref}`' | ||
|
||
|
||
@analyst_agent.tool | ||
def display(ctx: RunContext[AnalystAgentDeps], name: str) -> str: | ||
"""Display at most 5 rows of the dataframe.""" | ||
dataset = ctx.deps.get(name) | ||
return dataset.head().to_string() # pyright: ignore[reportUnknownMemberType] | ||
|
||
|
||
if __name__ == '__main__': | ||
deps = AnalystAgentDeps() | ||
result = analyst_agent.run_sync( | ||
user_prompt='Count how many negative comments are there in the dataset `cornell-movie-review-data/rotten_tomatoes`', | ||
deps=deps, | ||
) | ||
print(result.output) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/pydantic/pydantic-ai/actions/runs/16605150201/job/46974951928#step:9:46
https://91de4635-pydantic-ai-previews.pydantic.workers.dev/