This project demonstrates how to build, version, and evaluate AI agents using Azure AI Agent Service and Neon Serverless Postgres. It is designed for developers and QA engineers to safely test agent behavior variations, log structured evaluation metrics, and compare versions side-by-side.
-
Define multiple AI agent versions with different prompts or toolsets
-
Store configurations and responses in version-isolated Postgres branches
-
Log QA metrics such as:
- Response time
- Response length
- Keyword coverage
- Tool usage
-
Query logs to compare agent performance
- Python 3.9+
- An Azure subscription (create one)
- Azure AI Developer RBAC role
- Neon Serverless Postgres (install on Azure)
- Visit the Neon Azure portal
- Deploy your database, then access the Neon Console
- Create a project
- Create two branches from
main
:v1
andv2
- Copy connection strings for both branches
- Go to Azure AI Foundry portal or follow the guide.
- Create a hub and project
- Deploy a model (e.g., GPT-4o)
- Get your project connection string and model deployment name
git clone https://github.com/neondatabase-labs/neon-azure-multi-agent-evaluation.git
cd neon-azure-multi-agent-evaluation
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
Create a .env
file with the following:
AGENT_VERSION=v1
NEON_DB_CONNECTION_STRING_V1=your_neon_connection_string_branch_v1
NEON_DB_CONNECTION_STRING_V2=your_neon_connection_string_branch_v2
PROJECT_CONNECTION_STRING=your_azure_project_connection_string
AZURE_OPENAI_DEPLOYMENT_NAME=your_azure_openai_model
Switch AGENT_VERSION
between v1
and v2
to test different branches.
python agents.py
The script will:
- Create an agent for the current version
- Log its configuration
- Run the agent with a fixed prompt
- Log the response with QA metrics
Use SQL to analyze results per version:
SELECT version,
COUNT(*) AS total_runs,
AVG(response_length) AS avg_words,
AVG(latency) AS avg_response_time,
AVG(CASE WHEN heuristic_success THEN 1 ELSE 0 END) * 100 AS success_rate
FROM agent_logs
JOIN agent_configs ON agent_logs.config_id = agent_configs.id
GROUP BY version;