Hybrid VPC: Deploy an Enterprise AI Firewall Where Prompts Never Leave Your Network
The conversation in every enterprise security review meeting sounds the same:
“We want to use GPT-4 and Claude in our applications. But our compliance team says prompt data can't leave our network. What are our options?”
Until now, the options were binary: use a cloud AI gateway and accept that prompts transit through a third-party proxy, or build your own governance stack from scratch. The first option fails compliance reviews. The second takes 6-12 months and a dedicated engineering team.
Hybrid VPC is the third option. A compiled Go proxy that runs inside your network — handling DLP scanning, PII redaction, prompt injection blocking, and budget enforcement locally — while a cloud dashboard manages policies and displays metadata-only analytics.
The Problem: AI Governance vs. Data Sovereignty
Cloud AI gateways solve the governance problem — they scan prompts for PII, enforce budgets, block injections, and log everything. But they require your prompt data to pass through their servers. For many enterprises, that's a non-starter:
- ✗Healthcare (HIPAA) — patient data in prompts can't transit third-party infrastructure
- ✗Financial services (SOX/PCI) — cardholder data and financial records must stay within the compliance boundary
- ✗Government / defense — classified or sensitive data has strict data residency requirements
- ✗Legal — attorney-client privilege demands that case details never leave firm infrastructure
Self-hosting an open-source AI proxy solves the data sovereignty problem, but you lose the governance layer: no dashboards, no centralized policy management, no violation tracking, no budget enforcement. You're back to building from scratch.
The Solution: Hybrid VPC Deployment
Hybrid VPC splits the AI firewall into two zones:
Runs in Your VPC
- Compiled Go proxy with built-in DLP engine
- 30+ PII entity detection & redaction
- Prompt injection blocking
- Budget enforcement (per-request & monthly)
- Presidio NER sidecar (open-source)
- Sync agent for policy updates
Stays in Cloud
- Policy management dashboard
- Real-time metrics & violation analytics
- Deployment health monitoring
- Multi-project management
- Budget configuration
- Metadata-only telemetry ingestion
The critical guarantee: prompt and response content never leaves your network. The proxy processes everything locally and forwards cleaned requests directly to your AI provider (OpenAI, Anthropic, Google, etc.). The sync agent pushes only structural metadata to the cloud — token counts, latency measurements, entity type counts, and cost estimates. Content fields are rejected server-side.
How It Works: 5-Step Request Flow
Your application — sends a request to the hybrid proxy (same network, no external hop)
Go DLP engine — scans for PII, applies project-specific policies, checks budget limits — all locally in under 50ms
Proxy forwards — the cleaned request directly to OpenAI / Anthropic / Google (your provider keys, your network egress)
AI provider responds — and the proxy returns the response to your application
Sync agent pushes — metadata (token count, latency, entity types) to the cloud dashboard — no content
Integration: Two Lines of Code
The hybrid proxy is OpenAI SDK compatible. Point your base URL to the proxy and use your project API key:
from openai import OpenAI
client = OpenAI(
base_url="http://hybrid-proxy:8080/v1", # Your local proxy
api_key="oah_your_project_api_key",
)
# Every request is now DLP-scanned, budget-checked,
# and logged — without any data leaving your network
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Analyze this contract..."}],
)That's it. No SDK changes, no middleware, no code annotations. Every request flowing through the proxy is automatically scanned for 30+ PII entity types, checked against your project's DLP policy, and budget-verified — all before the request reaches your AI provider.
Security Architecture: Defense in Depth
Compiled Go Binary
The proxy ships as a statically-compiled binary in a scratch Docker image. No shell, no OS packages, no runtime interpreter. Even if someone gains container access, there's nothing to exploit.
Fail-Closed Architecture
If the DLP engine can't verify a request is safe, it blocks it. Policy errors, detection failures, and budget exhaustion all result in blocked requests — never silent pass-through.
Metadata-Only Telemetry
The sync agent transmits only structural metadata. Content fields are validated and rejected server-side. Even a misconfigured agent can't leak prompt data.
Local Budget Enforcement
Monthly budgets and per-request token limits are enforced locally. No cloud round-trip for budget checks. Limits are applied instantly, even during cloud connectivity interruptions.
Who Is Hybrid VPC For?
Hybrid VPC is designed for organizations that need full AI governance but can't send prompt data through a third-party cloud proxy:
- Enterprises in regulated industries (healthcare, finance, legal, government)
- Organizations with strict data residency requirements
- Teams that need to demonstrate data sovereignty for compliance audits
- Companies deploying AI in environments with restricted outbound network access
- Security-conscious organizations that want defense-in-depth for AI workloads
How Does It Compare?
| Cloud Gateway | Self-Host OSS | Hybrid VPC | |
|---|---|---|---|
| Prompt stays on-prem | No | Yes | Yes |
| Cloud dashboard | Yes | No | Yes |
| 30+ PII entities | Yes | 13 | Yes |
| Budget enforcement | Yes | No | Yes |
| Policy management UI | Yes | No | Yes |
| Setup time | 5 minutes | 30 minutes | 15 minutes |
| Ongoing maintenance | None | You | Minimal |
Getting Started
Hybrid VPC is available on the Enterprise plan. Here's how to get started:
- 1Contact us at enterprise@aisecuritygateway.ai to enable Hybrid VPC on your account
- 2We provision your deployment token and container registry access
- 3You deploy three Docker containers in your VPC using our Docker Compose template
- 4You point your app's base URL to the proxy — two lines of code
Ready to deploy an AI Firewall in your VPC?
Full DLP, budget enforcement, and policy management — without sending a single prompt to the cloud.
Related Articles
Stop Employees From Accidentally Leaking Data to AI Tools
Shadow AI is the new shadow IT. Deploy an AI firewall that auto-redacts PII from every ChatGPT, Claude, and Gemini call.
Prompt-Level PII Redaction at the Gateway Layer (Under 50ms)
How to implement prompt-level DLP and PII redaction at the LLM gateway without breaking real-time latency.
How to Prevent PII Leaks in ChatGPT API Calls
Three approaches to stop sensitive data from reaching AI providers — and how to implement automatic redaction in under 5 minutes.
Join the Community