pydantic · DouweM · Aug 1, 2025 · Jul 25, 2025 · Jul 25, 2025 · Jul 29, 2025
diff --git a/docs/examples/data-analyst.md b/docs/examples/data-analyst.md
@@ -0,0 +1,67 @@
+# Data Analyst
+
+Sometimes in an agent workflow, the agent does not need to know the exact tool
+output, but still needs to process the tool output in some ways. This is
+especially common in data analytics: the agent needs to know that the result of a
+query tool is a `DataFrame` with certain named columns, but not
+necessarily the content of every single row.
+
+With Pydantic AI, you can use a [dependencies object](../dependencies.md) to
+store the result from one tool and use it in another tool.
+
+In this example, we'll build an agent that analyzes the [Rotten Tomatoes movie review dataset from Cornell](https://huggingface.co/datasets/cornell-movie-review-data/rotten_tomatoes).
+
+
+Demonstrates:
+
+- [agent dependencies](../dependencies.md)
+
+
+## Running the Example
+
+With [dependencies installed and environment variables set](./index.md#usage), run:
+
+```bash
+python/uv-run -m pydantic_ai_examples.data_analyst
+```
+
+
+Output (debug):
+
+
+> Based on my analysis of the Cornell Movie Review dataset (rotten_tomatoes), there are **4,265 negative comments** in the training split. These are the reviews labeled as 'neg' (represented by 0 in the dataset).
+
+
+
+## Example Code
+
+```snippet {path="/examples/pydantic_ai_examples/data_analyst.py"}```
+
+
+## Appendix
+
+### Choosing a Model
+
+This example requires using a model that understands DuckDB SQL. You can check with `clai`:
+
+```sh
+> clai -m bedrock:us.anthropic.claude-3-7-sonnet-20250219-v1:0
+clai - Pydantic AI CLI v0.0.1.dev920+41dd069 with bedrock:us.anthropic.claude-3-7-sonnet-20250219-v1:0
+clai ➤ do you understand duckdb sql?
+# DuckDB SQL
+
+Yes, I understand DuckDB SQL. DuckDB is an in-process analytical SQL database
+that uses syntax similar to PostgreSQL. It specializes in analytical queries
+and is designed for high-performance analysis of structured data.
+
+Some key features of DuckDB SQL include:
+
+ • OLAP (Online Analytical Processing) optimized
+ • Columnar-vectorized query execution
+ • Standard SQL support with PostgreSQL compatibility
+ • Support for complex analytical queries
+ • Efficient handling of CSV/Parquet/JSON files
+
+I can help you with DuckDB SQL queries, schema design, optimization, or other
+DuckDB-related questions.
+```
diff --git a/docs/tools.md b/docs/tools.md
@@ -12,7 +12,7 @@ There are a number of ways to register tools with an agent:
 - via the [`@agent.tool_plain`][pydantic_ai.Agent.tool_plain] decorator — for tools that do not need access to the agent [context][pydantic_ai.tools.RunContext]
 - via the [`tools`][pydantic_ai.Agent.__init__] keyword argument to `Agent` which can take either plain functions, or instances of [`Tool`][pydantic_ai.tools.Tool]
 
-For more advanced use cases, the [toolsets](toolsets.md) feature lets you manage collections of tools (built by you or providd by an [MCP server](mcp/client.md) or other [third party](#third-party-tools)) and register them with an agent in one go via the [`toolsets`][pydantic_ai.Agent.__init__] keyword argument to `Agent`.
+For more advanced use cases, the [toolsets](toolsets.md) feature lets you manage collections of tools (built by you or provided by an [MCP server](mcp/client.md) or other [third party](#third-party-tools)) and register them with an agent in one go via the [`toolsets`][pydantic_ai.Agent.__init__] keyword argument to `Agent`.
 
 !!! info "Function tools vs. RAG"
     Function tools are basically the "R" of RAG (Retrieval-Augmented Generation) — they augment what the model can do by letting it request extra information.

diff --git a/examples/pydantic_ai_examples/data_analyst.py b/examples/pydantic_ai_examples/data_analyst.py
@@ -0,0 +1,107 @@
+from dataclasses import dataclass, field
+
+import datasets
+import duckdb
+import pandas as pd
+
+from pydantic_ai import Agent, ModelRetry, RunContext
+
+
+@dataclass
+class AnalystAgentDeps:
+    output: dict[str, pd.DataFrame] = field(default_factory=dict)
+
+    def store(self, value: pd.DataFrame) -> str:
+        """Store the output in deps and return the reference such as Out[1] to be used by the LLM."""
+        ref = f'Out[{len(self.output) + 1}]'
+        self.output[ref] = value
+        return ref
+
+    def get(self, ref: str) -> pd.DataFrame:
+        if ref not in self.output:
+            raise ModelRetry(
+                f'Error: {ref} is not a valid variable reference. Check the previous messages and try again.'
+            )
+        return self.output[ref]
+
+
+analyst_agent = Agent(
+    'openai:gpt-4o',
+    deps_type=AnalystAgentDeps,
+    instructions='You are a data analyst and your job is to analyze the data according to the user request.',
+)
+
+
+@analyst_agent.tool
+def load_dataset(
+    ctx: RunContext[AnalystAgentDeps],
+    path: str,
+    split: str = 'train',
+) -> str:
+    """Load the `split` of dataset `dataset_name` from huggingface.
+
+    Args:
+        ctx: Pydantic AI agent RunContext
+        path: name of the dataset in the form of `<user_name>/<dataset_name>`
+        split: load the split of the dataset (default: "train")
+    """
+    # begin load data from hf
+    builder = datasets.load_dataset_builder(path)  # pyright: ignore[reportUnknownMemberType]
+    splits: dict[str, datasets.SplitInfo] = builder.info.splits or {}  # pyright: ignore[reportUnknownMemberType]
+    if split not in splits:
+        raise ModelRetry(
+            f'{split} is not valid for dataset {path}. Valid splits are {",".join(splits.keys())}'
+        )
+
+    builder.download_and_prepare()  # pyright: ignore[reportUnknownMemberType]
+    dataset = builder.as_dataset(split=split)
+    assert isinstance(dataset, datasets.Dataset)
+    dataframe = dataset.to_pandas()
+    assert isinstance(dataframe, pd.DataFrame)
+    # end load data from hf
+
+    # store the dataframe in the deps and get a ref like "Out[1]"
+    ref = ctx.deps.store(dataframe)
+    # construct a summary of the loaded dataset
+    output = [
+        f'Loaded the dataset as `{ref}`.',
+        f'Description: {dataset.info.description}'
+        if dataset.info.description
+        else None,
+        f'Features: {dataset.info.features!r}' if dataset.info.features else None,
+    ]
+    return '\n'.join(filter(None, output))
+
+
+@analyst_agent.tool
+def run_duckdb(ctx: RunContext[AnalystAgentDeps], dataset: str, sql: str) -> str:
+    """Run DuckDB SQL query on the DataFrame.
+
+    Note that the virtual table name used in DuckDB SQL must be `dataset`.
+
+    Args:
+        ctx: Pydantic AI agent RunContext
+        dataset: reference string to the DataFrame
+        sql: the query to be executed using DuckDB
+    """
+    data = ctx.deps.get(dataset)
+    result = duckdb.query_df(df=data, virtual_table_name='dataset', sql_query=sql)
+    # pass the result as ref (because DuckDB SQL can select many rows, creating another huge dataframe)
+    ref = ctx.deps.store(result.df())  # pyright: ignore[reportUnknownMemberType]
+    return f'Executed SQL, result is `{ref}`'
+
+
+@analyst_agent.tool
+def display(ctx: RunContext[AnalystAgentDeps], name: str) -> str:
+    """Display at most 5 rows of the dataframe."""
+    dataset = ctx.deps.get(name)
+    return dataset.head().to_string()  # pyright: ignore[reportUnknownMemberType]
+
+
+if __name__ == '__main__':
+    deps = AnalystAgentDeps()
+    result = analyst_agent.run_sync(
+        user_prompt='Count how many negative comments are there in the dataset `cornell-movie-review-data/rotten_tomatoes`',
+        deps=deps,
+    )
+    print(result.output)
diff --git a/examples/pyproject.toml b/examples/pyproject.toml
@@ -60,6 +60,9 @@ dependencies = [
     "gradio>=5.9.0; python_version>'3.9'",
     "mcp[cli]>=1.4.1; python_version >= '3.10'",
     "modal>=1.0.4",
+    "duckdb>=1.3.2",
+    "datasets>=4.0.0",
+    "pandas>=2.2.3",
 ]
 
 [tool.hatch.build.targets.wheel]

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -66,6 +66,7 @@ nav:
       - examples/chat-app.md
       - examples/question-graph.md
       - examples/slack-lead-qualifier.md
+      - examples/data-analyst.md
 
   - API Reference:
       - api/ag_ui.md