Secure Cloud-Native Vertex AI gRPC Proxy (Cloud Run + Terraform)

Description

This project provides a secure, cloud-native gRPC proxy service designed to run as a managed application on Google Cloud Run. It solves the common challenge of exposing Google Cloud's powerful Vertex AI Gemini endpoints, which require robust IAM-based authentication, to clients that prefer or require simpler static API Key authentication (sent via gRPC metadata).

Why this project? What problems does it solve?

While you can use SDKs like the Vercel AI SDK (@ai-sdk/google-vertex) to call Vertex AI directly using GCP authentication, this proxy offers an alternative pattern:

Simplified Client Authentication: The primary goal is to enable clients (scripts, apps) to access Vertex AI using only a static API key (x-api-key gRPC metadata header). This significantly lowers the barrier to entry compared to implementing full GCP authentication flows on the client-side.
Secure Server-Side Authentication: The proxy leverages Cloud Run's IAM runtime identity to securely authenticate calls to Vertex AI. This eliminates the need for clients to possess GCP credentials and avoids the security risks of managing and distributing Service Account key files.
Centralized Control & Abstraction: The Cloud Run gRPC proxy acts as a dedicated control point:
- Validation & Guardrails: Implement custom validation or logic (e.g., prompt size limits) before hitting the LLM.
- Model Abstraction: Change the backend Vertex AI model (VERTEX_AI_MODEL_ID in server.ts) without clients needing updates.
- Security Boundary: Limits direct exposure of your GCP project and Vertex AI resources.
Robust Streaming: Uses gRPC streaming natively, which is generally more reliable and efficient for server-to-client streaming on Cloud Run compared to SSE over HTTP, potentially avoiding buffering issues seen with HTTP proxies.
Managed Cloud-Native Infrastructure: Deploys a scalable, serverless Cloud Run application using Terraform, removing the burden of server management and ensuring reproducible, auditable infrastructure (VPC, Firewall, KMS/CMEK, IAM, Artifact Registry).
Enhanced Security Layers: Integrates VPC network confinement and Customer-Managed Encryption Keys (CMEK) for Artifact Registry and Cloud Run runtime state, adhering to GCP best practices.

Architecture Overview & Authentication Flow

Client (gRPC): Sends a gRPC request directly to the Cloud Run endpoint (*.run.app:443), including a static secret key in the x-api-key metadata header.
Cloud Run: The Cloud Run service receives the gRPC request.
- The service has ingress = all and allows unauthenticated invocations (requires Org Policy config).
Application Code (server.ts): The Node.js/gRPC app in Cloud Run:
- Receives the request.
- Reads the x-api-key from the gRPC metadata.
- Validates the key against the EXPECTED_API_KEY environment variable (fetched securely, see Security Considerations).
- If valid, constructs the request for Vertex AI.
Vertex AI: The Cloud Run service calls the Vertex AI API (unary or streaming), authenticating securely using its runtime service account identity (cloud_run_sa) which has the necessary roles/aiplatform.user permission.
Response: The response (unary or gRPC stream) flows back directly to the client.

Technology Stack

Cloud Provider: Google Cloud Platform (GCP)
Infrastructure as Code: Terraform (~> 5.0)
Compute: Cloud Run (v2) - Managed & Serverless (Gen2 required)
Networking: VPC Network, Cloud Firewall - Secure Isolation
Security: IAM Service Accounts, Key Management Service (KMS) - CMEK & Secure Identity, Secret Manager (Recommended)
Containerization: Docker, Artifact Registry - CMEK Encrypted
Application Runtime: Node.js (v20+)
RPC Framework: gRPC (@grpc/grpc-js, @grpc/proto-loader)
Interface Definition: Protocol Buffers (.proto)
GCP SDK (Server): @google-cloud/vertexai (Node.js)
CLI gRPC Client: @grpc/grpc-js
CLI Interactive Prompts: prompts (for user-friendly interactive input)
CLI Runner: tsx

Prerequisites

GCP Project: A Google Cloud Project with Billing enabled. Note your Project ID and Project Number.
Enabled APIs: Ensure the following APIs are enabled in your GCP project:
- Cloud Run API
- Vertex AI API
- Artifact Registry API
- Cloud Key Management Service (KMS) API
- Compute Engine API
- Cloud Resource Manager API
- IAM API
- Cloud Logging API
- Secret Manager API (Recommended)
- Cloud Build API (Optional, for CI/CD)
- Service Control API (May still be needed by underlying services)
Locally Installed Tools:
- Terraform CLI (terraform)
- Google Cloud SDK (gcloud)
- Node.js (v20+) and pnpm
- Docker Desktop or Docker Engine (docker)
Authentication:
- Authenticate gcloud: gcloud auth login
- Set up Application Default Credentials (ADC): gcloud auth application-default login
- Configure gcloud project: gcloud config set project YOUR_PROJECT_ID

⚠️ IMPORTANT: Organization Policy Prerequisite

This setup requires the Cloud Run service to allow unauthenticated invocations. This is achieved by granting the roles/run.invoker role to the special principal allUsers in the Terraform configuration (google_cloud_run_v2_service_iam_member "allow_unauthenticated").

Many organizations block this by default using the Organization Policy constraints/iam.allowedPolicyMemberDomains. If this policy is active and restricts members to specific domains, the terraform apply command will fail with an error similar to:

Error: Error applying IAM policy for cloudrunv2 service ... googleapi: Error 400: One or more users named in the policy do not belong to a permitted customer...

Resolution (Requires Organization Admin privileges):

Navigate to IAM & Admin -> Organization Policies in the GCP Console.
Select your Organization.
Find and select the Domain restricted sharing policy (constraints/iam.allowedPolicyMemberDomains).
Click EDIT.
Choose "Customize" (or "Manage Policy").
For "Applies to", select "Override parent's policy" (or similar option to customize for a specific resource).
Set the enforcement target to your specific Project ID (YOUR_PROJECT_ID).
Configure the policy for this project override to "Allow All" or disable enforcement (effectively removing the domain restriction only for this project).
SAVE the policy change.

If you cannot modify the Organization Policy, this specific deployment approach will not work. You would need to explore alternatives, such as requiring the client to send IAM Bearer tokens directly to Cloud Run (bypassing API Gateway) or implementing a more complex proxy mechanism.

Setup & Deployment (via Terraform & Docker)

Clone the Repository:

git clone <your-repo-url>
cd <repo-directory-name> # e.g., vertex-gcp-proxy

Configure Terraform Variables:

Create a terraform/secrets.tfvars file with your project_id and the static secret API key:

# terraform/secrets.tfvars
project_id       = "YOUR_PROJECT_ID"
# This is the static secret key the gRPC backend will check via x-api-key metadata
# For production, consider using Secret Manager instead of putting the key here.
EXPECTED_API_KEY = "YOUR_SECRET_STATIC_API_KEY_HERE" # Use a strong, unique key

IMPORTANT: Add *.tfvars to your .gitignore file (Crucial for security!).
Review terraform/variables.tf for other optional configuration.

Initialize Terraform:
```
cd terraform
terraform init
```
Apply Infrastructure: Review the plan carefully. This provisions the VPC, KMS keys, IAM roles, Artifact Registry, and deploys Cloud Run (initially potentially with a default image or previous image).
```
terraform plan -var-file="secrets.tfvars"
terraform apply -var-file="secrets.tfvars"
# If this fails with the Org Policy error, address the prerequisite above and retry.
```
Navigate back to the project root: cd ..
Install Application Dependencies:
- Use pnpm (ensure you have Node.js v20+):
```
pnpm install
```

Build & Push Application Container Image:

Get the Artifact Registry repository URL (run from project root):

export REPO_URL=$(terraform -chdir=terraform output -raw artifact_registry_repository_url)
echo "Repository URL: $REPO_URL"

Configure Docker authentication:

export REPO_HOST=$(echo $REPO_URL | cut -d'/' -f1)
echo "Configuring auth for: $REPO_HOST"
gcloud auth configure-docker $REPO_HOST --quiet

Build the image (run from project root):

# Use --platform linux/amd64 to ensure compatibility with Cloud Run's amd64 architecture,
# especially if building on ARM-based machines (e.g., M1/M2 Macs).
docker build --platform linux/amd64 -t ${REPO_URL}/vertex-proxy:latest .

Push the image:

docker push ${REPO_URL}/vertex-proxy:latest

Update Cloud Run Service (Deploy Image):
- Navigate back to the Terraform directory: cd terraform
- Apply the changes again. Terraform updates the Cloud Run service to use the newly pushed image.
```
terraform apply -var-file="secrets.tfvars"
```
- Navigate back to the project root: cd ..

Usage

Get Cloud Run Service URL:

export SERVICE_RAW_URL=$(terraform -chdir=terraform output -raw cloud_run_service_url)
# Construct the gRPC target (hostname:port)
export GRPC_TARGET=$(echo $SERVICE_RAW_URL | sed 's|https://||g'):443 
echo "gRPC Target: $GRPC_TARGET"

Test with CLI: (See CLI Usage section below for setup) Use the static secret API key you defined in secrets.tfvars (set in src/cli/.env).
```
pnpm run cli:chat -p "Explain gRPC streaming."
pnpm run cli:stream -p "Tell me a short story using gRPC streaming."
```
Client Configuration:
- Target Address: Configure your gRPC client to connect to the Cloud Run service hostname on port 443 (e.g., your-service-name-....run.app:443).
- Credentials: Use standard TLS/SSL credentials (e.g., grpc.credentials.createSsl() in Node.js). Cloud Run's frontend provides the TLS termination.
- Metadata: Send the static API key in the x-api-key metadata header.
- Protobuf: Use the src/proto/vertex_proxy.proto definition to generate client stubs or load it dynamically.

Security Considerations

IAM Identity for Vertex AI: Cloud Run authenticates to Vertex AI using its managed identity.
VPC Network Confinement: Cloud Run operates within your private VPC.
CMEK: Container images and runtime state are encrypted.
Application-Level Authentication: Client authentication relies entirely on the static API key check within the gRPC server (server.ts), reading the x-api-key metadata. Since Cloud Run allows unauthenticated network invocations in this setup, this check is critical. Ensure the key is strong and kept secret.
API Key Management: The current implementation uses an environment variable (set via Terraform variable) for the client API key. For production, strongly recommended: Use Google Secret Manager to store the static API key and grant the Cloud Run Service Account (roles/secretmanager.secretAccessor) permission to access it. Modify server.ts to fetch the key from Secret Manager on startup instead of relying on process.env.EXPECTED_API_KEY.
Firewall Rules: The basic firewall rule allows ingress from all sources to the VPC on the service port. Review and harden as needed (e.g., restrict source_ranges).

Troubleshooting / Logging

Monitor service health and troubleshoot issues using Cloud Logging in the GCP Console. Filter logs by the Cloud Run service name (var.cloud_run_service_name).
If terraform apply fails due to the Org Policy, see the IMPORTANT: Organization Policy Prerequisite section above.
If gRPC connections fail, ensure the target address (hostname:443) is correct, TLS credentials are used, and the x-api-key metadata is being sent.
Check server logs for API key validation errors or issues calling the Vertex AI API.

Alternative Architecture Considered: API Gateway with gRPC Passthrough

While the main branch connects gRPC clients directly to the Cloud Run service for potentially better streaming performance, an attempt was made to reintroduce API Gateway in front of the gRPC backend. This was explored primarily to leverage API Gateway features like rate limiting, managed monitoring, and potentially avoiding the allUsers invoke permission on Cloud Run (by having the Gateway authenticate to an internal Cloud Run service using its service account).

The goal was native gRPC passthrough: Client (gRPC) -> API Gateway -> Cloud Run (gRPC).

However, configuring API Gateway for this pattern proved problematic:

Initial OpenAPI Attempt: Using a minimal OpenAPI specification to define the Cloud Run gRPC backend resulted in 404 Not Found errors from API Gateway, indicating it couldn't route the specific gRPC method paths (e.g., /vertexproxy.VertexProxy/Chat) based solely on the OpenAPI definition.
gRPC Service Config Attempt: Switching the Terraform configuration (google_api_gateway_api_config) to use the grpc_services block (with a proto descriptor set and a grpc_service_config.yaml) also failed during Terraform deployment. Errors indicated issues with missing backend rules or duplicate configuration source paths, suggesting that native gRPC passthrough configured this way is either unsupported or requires a more complex, non-obvious setup.

Transcoding Alternative: The standard, documented approach for API Gateway with gRPC backends is HTTP/JSON transcoding (Client (HTTP) -> API Gateway -> Cloud Run (gRPC)). This was not pursued in the main branch because: a) It might reintroduce the streaming buffering issues that the switch to native gRPC aimed to solve. b) It requires clients to use HTTP/SSE again, reverting the client-side implementation.

Conclusion: Due to the configuration difficulties with native gRPC passthrough, the main branch retains the direct Cloud Run connection. This prioritizes the direct gRPC communication path at the expense of built-in API Gateway features.

The code demonstrating the attempt to configure API Gateway with the grpc_services block (Point 2 above) can be found in the feat/api-gateway branch.

Future Improvements

Integrate Google Secret Manager for EXPECTED_API_KEY.
Implement more sophisticated request/response transformation logic.
Enhance security by requiring Google Cloud IAM authentication for clients (e.g., sending ID tokens in metadata) instead of allUsers invoke + static key, if clients can support it.
Set up a CI/CD pipeline (e.g., Cloud Build).
Add unit and integration tests for the gRPC server (server.ts).
Add gRPC health checks.

Project Structure

vertex-gcp-proxy/
├── terraform/          # Terraform configurations
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   ├── providers.tf
│   └── secrets.tfvars  # (GITIGNORED!) For sensitive variables
├── src/                # Node.js/TypeScript source code
│   ├── server.ts       # Main gRPC server implementation
│   ├── proto/          # Protocol Buffer definitions
│   │   └── vertex_proxy.proto
│   └── cli/            # Command-line interface (gRPC client)
│       ├── index.ts    # CLI entry point
│       ├── commands/   # CLI command implementations
│       │   ├── chat.ts # uses @grpc/grpc-js and prompts
│       │   └── stream.ts # uses @grpc/grpc-js and prompts
│       ├── .env        # (GITIGNORED!) gRPC Target/Key for CLI
│       └── .env.example # Example environment variables for CLI
├── .gitignore
├── .vscode/            # VS Code settings (optional, ignored by git)
│   └── settings.json
├── Dockerfile          # Container definition
├── eslint.config.js    # ESLint v9 configuration
├── package.json        # Node dependencies (managed by pnpm)
├── pnpm-lock.yaml      # pnpm lockfile
├── README.md           # This file
└── tsconfig.json       # TypeScript configuration

Development

Run the development gRPC server (auto-reloads on changes using tsx):

# Ensure EXPECTED_API_KEY is set as an environment variable locally for testing
export EXPECTED_API_KEY="your-dev-key"
pnpm run dev

Lint the code:
```
pnpm run lint
```

CLI Usage

The project includes a command-line interface (src/cli/) for interacting with the deployed Vertex AI gRPC proxy. Interactive modes (-i) use the prompts library for a robust user experience.

Setup:

Copy the CLI environment file:
```
cp src/cli/.env.example src/cli/.env
```

Edit the src/cli/.env file with your deployed Cloud Run gRPC Target (e.g., your-service-...run.app:443) and the static API key:

# src/cli/.env
# Get from 'terraform output -raw cloud_run_service_url' and append :443
GRPC_TARGET=your-service-name-....run.app:443 
API_KEY=YOUR_SECRET_STATIC_API_KEY_HERE

Run the CLI (using tsx for direct TS execution):
- Send a single prompt (unary gRPC):
```
pnpm run cli:chat -p "What is Cloud Run?"
```
- Start an interactive chat session (unary gRPC):
```
pnpm run cli:chat -i
```
- Send a single prompt with streaming response (streaming gRPC):
```
pnpm run cli:stream -p "Explain Vertex AI Gemini in detail via gRPC stream"
```
- Start an interactive chat session with streaming responses (streaming gRPC):
```
pnpm run cli:stream -i
```
(Optional) Install the CLI globally:
- Build the project first (creates JavaScript output in dist/):
```
pnpm run build
```
- Link the package globally:
```
# Navigate to the project root directory first
npm link
```
- Now you can use it from anywhere. Note: It will look for the src/cli/.env file relative to your current working directory, or use globally set GRPC_TARGET/API_KEY environment variables if the .env file is not found.
```
vertex-cli chat -p "Hello, Gemini via gRPC!"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Secure Cloud-Native Vertex AI gRPC Proxy (Cloud Run + Terraform)

Description

Architecture Overview & Authentication Flow

Technology Stack

Prerequisites

⚠️ IMPORTANT: Organization Policy Prerequisite

Setup & Deployment (via Terraform & Docker)

Usage

Security Considerations

Troubleshooting / Logging

Alternative Architecture Considered: API Gateway with gRPC Passthrough

Future Improvements

Project Structure

Development

CLI Usage

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
terraform		terraform
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
eslint.config.js		eslint.config.js
openapi.yaml		openapi.yaml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

License

soulform/vertex-gcp-proxy

Folders and files

Latest commit

History

Repository files navigation

Secure Cloud-Native Vertex AI gRPC Proxy (Cloud Run + Terraform)

Description

Architecture Overview & Authentication Flow

Technology Stack

Prerequisites

⚠️ IMPORTANT: Organization Policy Prerequisite

Setup & Deployment (via Terraform & Docker)

Usage

Security Considerations

Troubleshooting / Logging

Alternative Architecture Considered: API Gateway with gRPC Passthrough

Future Improvements

Project Structure

Development

CLI Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages